THE ANNALS 
of 
MATHEMATICAL 
STATISTICS | 


(FOUNDED BY H. C. CARVBR) 


Tue. OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


| The Elementary Gaussian Processes. 
» On Cumulative Sums of Random Variables. ABRAHAM WALD.. 


_ Some Improvements in Weighing and Other Experimental Tech- 
niques. Haro.ip Hore.iine 


. On the Analysis of a Certain Six-by-Six Four-Group Lattice De- 


E sign. Boyrp HARSHBARGER 
j Notes: 
a On the Expected Values of Two Statistics. H. E. Ropsins 

On Relative Errors in Systems of Linear Equations. A. T. Lonsern. 323 


A Reciprocity Principle for the Neyman-Pearson Theory of Testing 
Statistical Hypotheses. Lewis M. Court 


An Inequality due to H. Hornich. J. W. Brrnpaum and Hersert 


Note on a Lemma. 
A Note on Skewness and Kurtosis. 
_ News and Notices 
3 F Report on the Wellesley Meeting of the Institute 
Amendments to the Constitution and By-Laws of the Institute... 340 
© Abstracts of Papers 


Vol. XV, No. 3 — September, 1944 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
8. 8. WILKS, Editor 
A. T. CRAIG H. HOTELLING 


W. E. DEMING J. NEYMAN 
T. C. FRY W. A. SHEWHART 


WITH THE COOPERATION OF 


C. E1sENHART A. M. Moop 
W. K. FEe.uer H. Scuerrh 
P. G. Horn A. WaLp 

W. G. Mavnow J. WoLFrowITz 


The ANNALS or MatuematicaL Sratistics is published quarterly by the® 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, 2, 
Md. Subscriptions, renewals, orders for back numbers and other busindlil com= 
munications should be sent to the ANNALS OF MaTHEMATICAL Statistics, Mt 
Royal & Guilford Aves., Baltimore 2, Md., or to the Secretary of the nstie 
tute of Mathematical Statistics, P. 8. Dwyer, 116 Rackham Hall, University off 
Michigan, Ann Arbor, Mich. 

Changes in mailing address which are to become effective for a give 
issue should be reported to the Secretary on or before the 15th of th ; 
month preceding the month of that issue. The months of issue are March) 
June, September and December. Because of war-time difficulties of pam ica- 
tion, issues may often be from two to four weeks late in appes 
Subscribers are therefore requested to wait at least 30 days after month of is 
before making inquiries concerning non-delivery. 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATIS 
should be sent to 8. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot 
notes should be avoided. Figures, charts, and diagrams should be drawn @i 
plain white paper or tracing cloth in black India ink twice the size they are & 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. a 


Authors will ordinarily receive only galley proofs. Fifty reprints withou ¥ 
covers will be furnished free. Additional reprints and covers furnished at costs 


The subscription price for the ANNALS is $5.00 per year. Single copies $1.50: 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. § 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BattTimoreE, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 











THE ELEMENTARY GAUSSIAN PROCESSES 
By J. L. Doon 


University of Illinois 


1. Introduction 


One of the simplest interesting classes of temporally homogeneous stochastic 
processes is that for which the distributions of the defining chance variables 
{r(t)} are Gaussian. It is supposed that 

(A) if 4 <--- < 4, the multivariate distribution of x(t), --- , x(é,) is 
Gaussian, and that 

(B) this distribution is unchanged by translations of the t-axis. 

The process is N-dimensional if x(¢) is an N-tuple x(t), --- , zw(¢). The means 
E{x(t)}’ are independent of ¢, and will always be supposed to vanish in the 
following discussion. 

The correlation matrix function R(t): (r;;(t)) is defined by 


(1.1.1) rij(t) = E{a(s)x,(s + d)}. 


This expectation is independent of s, because of condition (B). The matrix 
function R(t) satisfies the equation 


(1.1.2) ris(0) = rji(—t), i, j : ae N. 
It follows that when ¢ = 0 the matrix is symmetric: 
(1.1.3) ri;(0) a r (0), L, j - 3,44 ’ N, 


and it is also well known that R(O) is non-negative definite. Conditions on the 
functions 7;;(t) necessary and sufficient that R(t) be the correlation matrix 
function of a stochastic process were found for the case N = 1 by Khintchine’® 
and for all N by Crameér.’ 

Hypothesis (A), that the process is Gaussian seems at first a restriction so 
strong that Gaussian processes are unimportant. These processes are, however, 
of fundamental importance, for the following reasons. 

(i) If R(t) is the correlation matrix function of any temporally homogeneous 
stochastic process, there is, according to Khintchine and Cramér, a Gaussian 
process with this same correlation function. ‘This Gaussian process is uniquely 
determined by the correlation function (assuming that all first order moments 
vanish, as usual). Because of this intimate connection between the temporally 
homogeneous Gaussian processes and the most general temporally homogeneous 


1 Singular Gaussian distributions will not be excluded. For example the x(t;) may all 
vanish identically. 

* The expectation of a chance variable z will be denoted by F{z}. 

* Matematische Annalen, Vol. 109 (1934), p. 608. 

* Annals of Math., Vol. 41 (1940), pp. 215-230. 


229 





230 J. L. DOOB 


processes, it is not surprising that very few facts are known about specifically 
Gaussian processes, that is facts which are true of temporally homogeneoys 
Gaussian processes, but not of temporally homogeneous processes in general. 

(ii) It follows from (i) that in any investigation of temporally homogeneous 
stochastic processes involving only first and second moments—for example 
least squares prediction by linear extrapolation—it may be assumed that the 
variables are Gaussian. Under this hypothesis, the investigator may be helped 
by the suggestive specialized interpretations possible in the Gaussian case of 
results which hold in the general case. For example if VN = 1, the least squares 
best prediction in the Gaussian case for x(n + 1) in terms of a linear combination 
of the variables z(1), --- , x(n) is the conditional expectation of x(n + 1) for 
given x(1),--- , «(n), which is the least squares best prediction of x(n + 1) 
in terms of x(1), --- , «(m) with no restriction on the functions involved. Thus 
the linearity of the prediction, which must be part of the hypothesis in the gene- 
ral case, is automatically true in the Gaussian ease. There is necessarily a linear 
least squares best prediction of a(n + 1) in terms of the complete past 

- , a(n — 1), x(n) since the corresponding conditional expectation is certainly 
defined in the Gaussian case, and is linear in that case. 

(iii) In many applications, there is a real justification for hypothesis (A) that 
the process is Gaussian. This is so in certain physical studies, for example, 
because the Maxwell distribution of molecular velocities is Gaussian. Examples 
will be given below. 

The processes discussed in the present paper are all temporally homogeneous 
Gaussian processes. Most of the theorems will be valid for any temporally 
homogeneous processes for which the second moments of the variables exist,’ 
with the following changes: independent chance variables which are linear com- 
binations of the x(s) will become merely uncorrelated chance variables; the 
convergence with probability 1 of a series of such chance variables will become 
merely convergence in the mean; the conditional expectation of one such variable 
y for given values of others, y; , y2 , --- will become merely the linear approxima- 


tion Zz. a;y; of y in terms of the y; which minimizes 
i 


E{ly — 2d. ays’), 


that is to say the conditional expectation becomes the least squares linear 
approximation. 

The following theorem and its corollary are fundamental in the study of linear 
prediction involving infinitely many variables. The results are implicit in much 
of the work on the subject but do not seem to have been stated explicitly before. 

THEOREM 1.2. Let --- ,2%,2%1,--- be a sequence of one-dimensional Gaussian 


5 The processes need not even be temporally homogeneous. It is necessary only that 
E{x(s)} and E{xz(s) x(s + t)} be independent of s. 





GAUSSIAN PROCESSES 231 
chance variables with the property that if ny < --- < n,, the multivariate distri- 
bution of Xn, °°* » Xn, 08 Gaussian and that 
(1.2.1) Bil«+* , Sua s Sat Bal @ Su : 
wheneverm <n. Then E{xm} = a is independent of m, and 
(1.2.2) --+ S E{(am — a)} S E{(amu. — a)" S---. 

If the \x,} are defined for all negative integers, 


(1.2.3) lim tm = 2. 


exists with probability 1 and 
(1.2.4) lim E{(x. — 2m)°} = 0. 


If the {xn} are defined for all positive integers, and if the dispersions in (1.2.2) 
form a bounded sequence, 


(1.2.3’) lim 2m = Ze 


m—co 
exists with probability 1, and 


(1.2.4’) lim E{ (ae — Xm)*} 


moo 


It follows from (1.2.1) that 
(1.2.5) U{z,} = E{E{z,. ;2,.}} = E{z.}. 


Hence --- = Ef{xao} = Ef{ay} = ---. It will be vo restriction to assume from 
now on that 


-= Elam} = Elm} = 
It also follows from (1.2.1) that 
> 7 ’ ’ ’ y 
(1.2.6) Elamtn} = E{E {am 5 tnta}} = ElatmE {tm 3 In 


Using this equation, 


(1.2.7) Efai.} = E{f{(a, — am) + aml} = Ef(an — tm)"} + Ef{xn}, 


and the dispersions of the x, thus form a monotone non-decreasing sequence. 


® The conditional expectation of a chance variable y for given values of a chance variable 
7 will be denoted by E{n; y}. 

7 Much of this theorem remains true if (1.2.1) is true but only the first moments of the 
Z, are supposed finite, no other hypothesis being made on their distributions. Cf. Doob, 
Am. Math. Soc. Trans., Vol. 47 (1940), pp. 458-460. 

® Alternatively, (1.2.1) implies that z, — x, is uncorrelated with z,. Then E{z,2,} = 
E{z3, + (tn — 2m)2m} = E{x?,}. 


m 





232 J. L. DOOB 


Finally, using (1.2.6), 
(1.2.8) E{ (tm+1 — Ce NPeas — Xn)} 


The series 


(1.2.9) 3 (amn41 acai Zum) 


m 


is therefore a series of mutually independent chance variables. According to 
well known theorem of Kolmogoroff, a sequence of mutually independent chance 
variables converges with probability 1 if the means and dispersions form a con- 
vergent series. The present theorem follows at once from Kolmogoroff’s 
theorem. 


Coroutuary. Let x be a one-dimensional Gaussian chance variable and lei 
(1.2.10) 


be sequences of one-dimensional Gaussian chance variables with the property that 
if v = 1, the multivariate distribution of x, Xmi, *** 5 Lm» 18 Gaussian, and suppose 
that each variable Xm, is a member of every later sequence. Then 
lim E{2mi, %m2,°°* 57} 2. 
m—>—oo 
(1.2.11) ; 
lim E {Xml , tans °° * 5 x} 


m—>+00 


exist with probability 1, and in the mean, and 
(1.2.12) z. = Eliz, ,m = 0, #1, ---,2n = 1,2;--- 
It will first be shown that the sequence {x}, where 


(1.2.13) Lim = ESam ; Xm2 oy 4 ; x} 


3 


has the property demanded in the theorem. In fact, from the definition of 

conditional expectation, the difference x — 2x, has expectation zero and is in- 

dependent of the variables {x,,;} for m < n, and therefore of the variables 
>, Un-1, Zn. Hence 


(x — Bn) — (& — Lagi) = Vasi — Tn 


has expectation zero and is independent of the variables --- , 2,1, 2,. There- 

fore the sequence {z,.; — 2Z,} iS a sequence of mutually independent chance 
| : 

variables with vanishing expectations. This implies (1.2.1) if m < n because 


( n—1 


E{--+,2m5tn} = E+ ++, m5 im t Dy (Xin: — 2) > = Ef +++, tm 5 Tn} 
(1.2.14) se 


+ DO El-++ yam 32H — 2,3 
™ 





n of 
S in- 
ables 


here- 
ance 
pause 


GAUSSIAN PROCESSES 233 


Let a be the common value of FE {x}, E{x,}. Since x — x,, is independent of 2, , 


(1.2.15) E{rm — ay} + E{(x — an,)’} = Ef(a — a)’}. 
Hence the sequence of dispersions of the x,, is bounded and according to Theorem 
1.2 the limits x_ and x, in (1.2.1) exist with probability 1. Since z — 2, has 
expectation zero and is independent of 2,,; for m < n, x — x, also has expecta- 
tion zero and is independent of 2,,; for all m, that is (1.2.12) is true. 

The simplest non-trivial special case of this theorem is the following: 
Let 11, %2,-°++ , « be one-dimensional Gaussian chance variables with the property 
that if v = 1 the muliwariate distribution of x, x, , +--+: , x, 7s Gaussian. Then 


(1.2.16) lim Hix, 


no 


with probability 1, and this limit is also a limit in the mean. 

As stated, the theorem and corollary are true without the hypothesis that the 
chance variables concerned are Gaussian. (The existence of second moments 
must be assumed if the limits are to exist as limits in the mean.) They are 
stated for Gaussian variables because the proof is simple in that case, and be- 
cause that is sufficient for the purposes of this paper. 

In discussing t.h.G. processes whose parameter ¢ is not restricted to be integral, 
the usual continuity hypothesis will be made. It will be supposed that R(é) 
is continuous at ¢ = 0: 


(1.3.1) lim [R(t) — R(O)] = —4lim E{[x(t) — x(0)]}°} = 0. 


t—0 t-—0 


It is then easily concluded that R(é) is everywhere continuous. 

In the continuous parameter case, it would be useful to have the conditions 
on R(t) necessary and sufficient for the continuity in ¢ of the chance variables 
z(t) and for the existence of the derivative. No set of necessary and sufficient 
conditions for the continuity of x(t) is known, although the fact of continuity 
will not be difficult to prove in the special cases to be considered in §4. The 
conditions for the existence of x’(#) are quite simple, and will be given in Theorem 
1.4, 

The spectral function of a one-dimensional t.h.G. process will play an essential 
role in some of the theorems to be discussed below. If R(n) is the correlation 
function of a one-dimensional t.h.G. process, R(n) can be expressed in either 
of the forms 


(1.3.2) R(n) - | cos n\ dF (\) 
0 


(1.3.2’) R(n) = [ e'” dG(a) 


T 





234 J. L. DOOB 


where, F(A), called the spectral function of the process, and G(A), called the 
complex spectral function of the process, are real monotone non-decreasing 
functions satisfying the following conditions: 


F(0) = 0 G(—1r) = 0 
(1.3.3) F(—) = FQ), O<A <a, GA—) = GA), -7r<A<-F 
G(A) — GO+) = GO) — G(— a4) 
F(x) = G(r) 
(1.3.4) F(A) = GOA) — G(— A+) = 2G(d) — GO) — GO+), O<rA <2 
F’(\) = 2G’(d)° 


The last equation of course holds only at points where the derivatives exist. The 
forms (1.3.2), (1.3.2’) are derived trivially from each other. The correlation 
function determines the spectral functions uniquely, if the latter are supposed 
to satisfy (1.3.3). In fact, at the points of continuity of F(A), G(A): 


AR = 


FQ) = a: $2 R(n) sin nd 
(1.3.5) = 
GA) = a = + lim > R(n) “ 


i veo —P 
nx 
Conversely if any F(A) or G(A) satisfying the stated conditions is used to deter- 
mine an R(n) by means of (1.3.2) or (1.3.2’), R(n) is the correlation function of a 
t.h.G. process. The representation of R(n) in terms of G(A) is frequently more 
convenient than that in terms of F(A), because of the simple properties of the 
exponential function. The following relation, which will be used below, exhibits 
the elegance attained by the use of G(A): 


E{(>> cna(m)) 15 cx a(n)}} = DS cares R(m — n) 
™m™ n 


m,? 


(1.3.6) 


= [| Se™ SD cre) aaa. 


The correlation function of a one-dimensional continuous parameter t.h.G. 
process can be represented in either of the following forms: 


(1.3.7) R(t) =| cos th dF (A) 
0 


(1.3.7’) R(t) = [ e® dG) 


°H. Wold, A Study in the Analysis of Stationery Time Series, Uppsala, (1938), p. 66. 












he 
ng 


lhe 
‘ion 
sed 


ster- 
of a 
nore 
the 
ibits 


p. 66. 


GAUSSIAN PROCESSES 235 
where the spectral function F(A) and the complex spectral function G(A) are 
monotone non-decreasing and satisfy the conditions 
F(0) = 0 G(—«) =0 
(1.3.8) FA-) = FO), O<A< =x G(A—) = G(A) 






G(x) — GO+) = G0) — G(— A+) 
















F(«) = G(«) 
(1.3.9) F(A) = GA) — G(— A+) = 2G(A) — GO) — GO0+), O<A<« 
F’(A) = 2G’(A). 


’ 


The last equation of course only holds at points where the derivatives exist. 
The correlation function R(t) determines the spectral functions uniquely if the 
latter are supposed to satisfy (1.3.8). In fact, at the points of continuity of 


F(A), GA): 


FQ) =2 | ne 2? a 
tT Jo t 
(1.3.10) Peer 
GQ) = lim [ R() °—— at + GO). 
272 L. 7 t 


T—2 


THEOREM 1.4. Let {x(t)} be the variables of a one-dimensional continuous 


parameter t.h.G. process with correlation function R(t) and spectral function F(X). 
If 
(1.4.1) [ xara <= 
C 
then 


(i) R’(t), R(t) exist and are continuous, and R'(0) = 0; | 
(ii) x(t) zs an absolutely continuous function of t, with probability 1; 
(iii) for each t, 


(1.4.2) lim “ek — 2M _ 


h—0 h 


1/4) ig 
x'(t) 
exists, with probability 1, and this convergence is also true in the mean: 


(1.4.3) im p{[ e+ — 2 _ “o]} oi | 


(iv) the x'(t) process is a t.h.G. process, with correlation function —R''(t) and 
r 


spectral function [ \’ dF(A). 
0 


Conversely if 
(1.4.4) lim inf —-  . slim inf E [221 < ow, 
h-0 h2 h—0 h ) 
then (1.4.1) ts true. 


236 J. L. DOOB 


This theorem is due to Slutsky. The proof will be sketched here, for com- 
pleteness. (The hypothesis that the process is a Gaussian process is immaterial, 
since only the second moments are involved in the proof.) 

Proof of (i). If the integral (1.4.1) exists, R’(t), R’’(t) can be obtained by dif- 
ferentiating under the integral sign in (1.2.2): 


nie = ~ [ d sin td dF(A) 
0 
(1.4.5) © 
R'() = - | \” cos & dF (A). 
0 


Then R’(t), R’(t) are continuous functions, and R’(0) = 0. 
Proof of (2%), (at), (iv). The quantity 


(1.4.6) Ee +h) — x(t) _ x(t + he) — “oy 
hy he 


can be evaluated in terms of the correlation function R(t), and approaches 0 
with h; , i. , if the second derivative R’’(t) exists. There is therefore a chance 
variable y(t) to which the difference quotient converges in the mean: 


(1.4.7) lim Hats = 2) _ woof} wm 


The y(t) process is a t.h.G. process. Moreover the equation 
(1.4.8) E{x(s)x(s + DO} = RYO 

can be differentiated to give 

(1.4.9) E\x(s)y(s + t)} = Efa(s — t)y(s)} = R’(t) 

and this in turn when differentiated becomes 

(1.4.10) E{y(s — thy(s)} = Efy(s)y(s + )} = —R’(t). 
Hence the y(t) process has correlation funetion —R’’(t). Finally, 


( nt 2 
| ato — x(0) — | y(s) as|\ = Ef{{x(t) — x(0)P} 
0 
(1.4.11) 


t t t 
‘ i | E {y(s)y(s’)} ds ds’ — 2 I E{[x(t) — 2(0)}y(s)} ds = 0, 
0 0 0 


(evaluating the right side of (1.4.11) in terms of R(t), R’(t), R’’(t)). Thus z(¢) 
is absolutely continuous, with probability 1, and y(¢) is the derived function 
z(t). Hence x’(#) exists for almost all ¢, with probability 1.’° It follows (Fu- 
bini’s theorem) that the limit in (1.4.2) exists for each t, with probability 1, 
except possibly for a ¢-set of Lebesgue measure 0. Since the process is t.h., the 

10 For the exact meaning and measure-theoretic justification for statements of this type, 
see Doob, Am. Math. Soc. Trans., Vol. 42 (1937), pp. 107-40. 








ice 








GAUSSIAN PROCESSES 237 





exceptional set must be either empty or the whole ¢-line. 
is therefore empty. 

Conversely if (1.4.4) is true, (1.4.1) follows at once from (1.3.7). 

It will be convenient to use condensed notation below. If 2: (a, +--+ , zw), 
y: (yi, °** , yw) are N-dimensional vectors and if A: (a;;) is an N-dimensional 
square matrix, x-y will denote the matrix (7,g;), Ax the vector with components 

as Pre is ; * . * a 
> aij and (x, y) the number 7 vgyi. The adjoint matrix (a;;): aj; = Gi 


1 


The exceptional set 


will be denoted by A*. Throughout this paper, the chance variables will be 
real-valued, but it will be convenient to use complex constant vectors. The 
identity matrix will be denoted by J. It will be convenient to denote the 7, jth 
term of the matrix A by (A);;. The following equations will be used frequently: 


Ax-By = A(x-y)B*, (Az, y) = (a, A*y). 


If z is a chance variable, it is clear that E{x-x} is a symmetric non-negative 
definite matrix. 

The simplest Gaussian processes are those in which the distribution of future 
states is based not on the complete past, but only on the immediate present. 
The precise definition of this (Markoff) property is the following. 

(C) If t, < -++ t4, the conditional distribution of x(¢,,1) for given values of 
a(t;), --- , x(é,) depends only on the value assigned to x(t,). The conditional 
distribution of x(t,.:) for given values of x(t), --- , 2(t,) will then be simply the 
conditional distribution of x(t,,1) for the assigned value of x(t,). 

The processes to be discussed in this paper are the processes with properties 
(A), (B), (C): temporally homogeneous Gaussian Markoff (t.h.G.M.) processes. 
The properties of t.h.G.M. processes will also be used to derive properties of the 
most important simple types of one-dimensional t.h.G. processes—those with 
rational spectral density functions. Some of the results are contained implicitly 
in the work of previous writers, but the presentation of the results has in all cases 
been chosen to stress their specific probability significance, and may therefore 
appeal even to readers familiar with previous work. 

The condition (C) on a Gaussian process is equivalent to the condition (C’) 
that if 4; << --- < tas 


(1.5.1) 











E{a(t), --- , x(t); r(tas)} = Efxlt,); x(t41)}. 





In fact. (C) is at least as strong as (C’). Conversely if (C’) is true, 
Uta) = (bu) — Efa(t); r(tu)} + Ef{x(t); xr(t4)} 


=yt Elx,); cG41)}, 


where y is a Gaussian chance variable with mean 0 uncorrelated with and 
therefore independent of x(t), --- , x(¢,), and the last term of (1.5.2) is simply a 
multiple of x(t,). Then the conditional distribution of x(t,,,) for given x(t), 
‘++, x(t,) is a Gaussian variable, with mean E{2(t,); x(t,4:)} and dispersion that 
of y. Since this conditional distribution depends only on x(t,), property (C’) 


(1.5.2) 






























































































































































238 J. L. DOOB 


implies property (C). Hence these properties are equivalent. The condition 
(C’) can be written in the form 


(1.5.3) Ef{a(r), r S 8s; x(s + t)} = Ef{z(s); x(s + 2d}, t> @ 


In many applications the stochastic processes either have this property already 
or will have it if the dimensionality of the processes is increased by the adjunc- 
tion of auxiliary chance variables. In the latter case the process is called a 
component process of a t.h.G.M. process. Component processes are discussed 
in detail below. If a process is a t.h.G. process, the right side of (1.5.3) is a 
linear transformation (depending only on ¢) of :x(s): 


(1.5.4) E{2x(s); x(s + t)} = A(z2(s), [> <¢. 


The matrix function A(é) will be called the transition matrix function. It 
satisfies the equation (obtained by performing the operation F'{2x(s)-} on both 
sides of (1.5.4)) 


(1.5.5) R(t) = RO)A(t)*, t > 0, 


but is otherwise unrestricted since if (1.5.5) is true, the difference x(t) — A (é)x(s) 
is uncorrelated with and therefore independent of x(s). In many applications 
the elements of R(¢) will vanish identically except in square matrices down the 
main diagonal. If this is true, A(t) can also be assumed in this form. 

If the variables {x(t)} determine an N-dimensional t.h.G.M. process, and if B 
is a non-singular N-dimensional square matrix, the variables {Bx(t)} also de- 
termine a t.h.G.M. process. Two processes connected in this way will be called 
equivalent. If two t.h.G. processes are equivalent, and if one is a Markoff 
process, the other must be also. If there is a change of variable 


(1.5.6) y(t) = Batt) 


taking the t.h.G.M. x(t) process with transition matrix A(t) and correlation 
matrix R(t) into the equivalent y(é) process with transition matrix A,(¢) and 
correlation matrix R,(t), then 


(1.5.7) A(t) = BA()B", R(t) = BR(t)B*. 


If ja(t)}, ty}, {2(t)} determine t.h.G. processes of dimensions a, 8 and 
a + B respectively, if the process determined by 


{x(t), el Lat), yi(t), ere ya(t) } 


is equivalent to the z(t) process, and if every 2x(s) is independent of every y(t), 
the z(t) process will be called the direct product of the x(t) and y(t) processes. 
The extension of the definition to direct products of more than two processes 
is clear. If the x(t) and y(t) processes are Markoff processes, their direct product 
is also a Markoff process. Conversely if the z(t) process is a Markoff process, 
the factor processes must also be Markoff processes. The following facts about 
matrices will be used below. If A is any N-dimensional matrix, there is a non- 








led 
off 


ion 
and 


and 


y(t), 
sses. 
asses 
duct 
cess, 
bout 
non- 














GAUSSIAN PROCESSES 239 


- , ° ° e pi a . _ 

singular N-dimensional matrix B such that BAB is in Jordan canonical form: 
—] . . . . 

the elements of BAB vanish except for those in certain submatrices down the 

main diagonal. Each of these submatrices has the form 


0 
0 







A 0 
1 A 





0 0 1 





» 


or simply (A) if it is one-dimensional. The )’s are the characteristic values of A, 
that is the roots of the characteristic equation det. | A — AJ | = 0, and the sum 
of the dimensions of the submatrices with a given \ is the multiplicity of \ as a 
root of this equation. The matrix A is said to have simple elementary divisors 
corresponding to a given root \ of the characteristic equation if the submatrices 
in (1.5.7) with that \ are all of dimension 1. Thus orthogonal matrices, sym- 
metric matrices, and skew symmetric matrices have only simple elementary 
divisors, since they can be put in diagonal form, (with \’s of moculus 1, real, 
pure imaginary, respectively). The transformation B and the \’s may not be 
real. If A is real, however, there is a real matrix B such that the elements of 
B’ AB vanish except for square submatrices down the main diagonal, and the 
characteristic roots of different submatrices are neither equal nor conjugate 
imaginary. 

The powers of a matrix in Jorg@an canonical form are easily calculated using 
the fact that 


0 
n—1 n 

met. See tt. 

® «- - © 1 WwW ' in eS 


It follows that in the general case the elements (A”);; are linear combinations 
of Af, mA; , ete., where \1, Ax, --- are the characteristic values of A. Hence 
if (A");; ~ 0 as n — «, the approach must be exponential. The terms of A” 
certainly go to 0 if all the characteristic values of A have modulus less than 1. 

The matrix e“ is defined by the usual series formula for the exponential func- 
tion. If A has the form (1.5.8), e’ can be calculated using (1.5.9): 


e“ O +. » + 0 
(1.5.10) fae ee: 8 
eo 


It follows that in the general case the elements (e'“);; are linear combinations of 
ei tei ete. where \1, 42, --- are the characteristic values of A. If (e'4);; 30 
ast > o the approach must be exponential. The terms of e’“ certainly go to 0 
if all the characteristic values of A have negative real parts. 





























































































as 


a ea 


stenting ie A nl 
ma ~~ a a reo Faas és a a ~ 


240 J. L. DOOB 


A t.h.G.M. process will be called deterministic if the least squares prediction 
of z(s + ¢) for given x(s), (t > 0), that is E{x(s); x(s + é)} is alwayscorrect: 
(1.6.1) z(s + t) = A(t)z(s) t>0, 
with probability 1. 

The following classification of deterministic processes will be useful later. It 
will be shown that any t.h.G.M. process is the direct product of processes of the 
following four deterministic types, and of a factor process with no deterministic 
factors. 

M(0). Let {2(t)} be the variables of a one-dimensional t.h.G.M. process, 
with z(t) = 0 with probability 1. (The chance variable which is 0 with prob- 
ability 1 is considered as a Gaussian variable with mean 0 and dispersion 0.) 
The correlation function of the process vanishes identically. 

M(1). <A one-dimensional t.h.G.M. process which satisfies 


(1.6.2) x(t) = 2x(0), E{x(t)} = 0, E{x(t)’} > 0, 


will be called a process of type M(1). The correlation function R(é) is positive 
and independent of ¢. 

M(—1). A one-dimensional t.h.G.M. process with an integral-valued param- 
eter n, satisfying 


(1.6.3) --- =2(—1) = —2(0)=2(1)=--- Efx(n)} =0, Ef{x(n)*} >0 


° 
will be called a process of type M(—1). The correlation function is alternately 
positive and negative: R(n) = (—1)"R(0). 

M(e'*). A two-dimensional t.h.G.M. process with 


E}x,(0)} = 0, E{x,(0)} =o > 0, 7 {a(0)x2(0)} = 0, 
(1.6.4) 43(t) = x,(0) cos (0 — x2(0) sin té 
xe(t) = 2,(0) sin ¢6 + 2x2(0) cos #6, 


; ; r 40 ‘ . ‘ ; ‘i : 
will be called a process of type M(e'"). The correlation function is given by 


(1.6.5) R(t) = ( o cos tO o sin _. 


—o sin t6 o cos t6 


A process with variables {2(¢)} will be called degenerate if there are constants 
(1, °°: , ¢w not all 0, such that 


(1.7.1) dD, c2;(t) = 0 
d 
with probability 1, for all ¢. Equation (1.7.1) is true if and only if 


(1.7.2) E{(d) ¢;2,()P} = 2, (R(O)) jxe;ck = 0 


that is if and only if the correlation matrix R(O) is singular. If a non-degenerate 
process is a direct product of factor processes, the latter are also non-degenerate. 
The only degenerate one-dimensional process is that of type M(0). 









GAUSSIAN PROCESSES 








Z. The structure of degenerate and deterministic processes 


THEOREM 2.1. Every degenerate t.h.G.M. process is the direct product of proc- 
esses of type M(O) and (in some cases) of a non-degenerate t.h.G.M. process. 

In proving this theorem, it can be supposed that the original process has been 
replaced by an equivalent process, if necessary, so that the symmetric non- 
negative definite matrix R(O) is in diagonal form, with only 0’s and 1’s down the 
main diagonal, say 0 to the vth place and 1 thereafter. Then z,(t) = 0, when 
j < vand the process is obviously the direct product of v processes of type M (0) 
and an (N — v)-dimensional non-degenerate process. 

THEOREM 2.2. Let {x(t)} be the variables of a deterministic t.h.G.M. process, 
with correlation function R(t). 

(i) The process 1s the direct product of factor processes of types M(O), M(+1), 
Me’). 

(ii) Lf the parameter t of the process is restricted to the integers, there is a non- 
singular matrix A such that 















(2.2.1) x(n) 
R(n) 


(2.2.3) RO) = AR(0)A*. 


Ax (0) ’ 
R(0)A*", 









(2.2.2) n = 0, +1, +2, --- 












The transition matrix A is the transform BOB™ of an orthogonal matrix 0. If 
the process is non-degenerate, A is uniquely determined. 

(ii’) If the parameter of the process runs through all real numbers, there is a 
matrix Q such that 


(2.2.1’) a(t) = e'x(0), 
(2.2.2) R(t) = RO)e’”, 
(2.2.3) 


—o <ti<c ow, 





QR(O) + RO)Q* = 0. 


















The matrix Q is the transform BK B™ of a skew symmetric matrix K. If the process ty 
is non-degenerate, Q is uniquely determined. | 

(iii) Conversely if R(O) is any symmetric non-negative definite matrix, and 
if A(Q) is any matrix satisfying (2.2.3) ((2.2.3’)), where A is non-singular, there 
is a deterministic t.h.G.M. process with correlation function given by (2.2.2) ((2.2.2’)) 
and satisfying (2.2.1) ((2.2.1’)). ti 

In proving (i) (ii) and (ii’) it will be permissible to go to processes equivalent 
to the original one, if convenient. Moreover if the given process can be expressed 
as a direct product, it will be sufficient to prove (i) (ii) and (ii’) for each factor. 
Since (i) (ii) and (ii’) are certainly true for processes of type M(O) (with A in 
(ii) the identity, and Q in (ii’) the null matrix), and since according to Theorem 
2.1, processes of type (0) can be factored out of the given process to leave a 
non-degenerate remaining factor, if any, it will be sufficient to prove (i) (ii) and 
(ii’) for non-degenerate processes. 


242 J. L. DOOB 


Proof (t integral) of (i) and (ii) for non-degenerate processes. If the process 
determined by {x(n)} is deterministic, (1.6.1) is true. Hence 


(2.2.4) a(v + 1) = Ax(y). 


Then (2.2.1) is true for m 2 0, and will be established for all n as soon as it is 


shown that A is non-singular. Using (2.2.1), 

(2.2.5) R(n) = Ef{x(0)-a(n)} = Ef{xr(0)-A"x(0)} = RO)A*", 
and 

(2.2.6) RO) = E{x(1)-x(1)} = F{Ax(0)-Ax(0)} = AR(O)A*. 


Under the present hypotheses, R(0) is non-singular. Then A is determined 
uniquely by (2.2.5) with n = 1, and A cannot be singular because of (2.2.6), 
There is an equivalent process in which R(0) is the identity. Considering this 
process, (2.2.6) becomes J = AA*,so that A is orthogonal. Finally there is an 
equivalent process (obtained by an orthogonal change: of variables) in which 
R(O) is still the identity and the matrix A is in the (real) normal form of or- 
thogonal matrices: all the elements of A are 0 except for two-dimensional rotation 
matrices or 1’s or —1’s down the main diagonal. It is now obvious that the 
process is the direct product of processes of types M(+1), M(e’*). 

Proof (t continuously varying) of (i) and (i7’) for non-degenerate processes. If 
the t.h.G.M. process determined by j{2(t)} is deterministic, (1.6.1) is true. 
Hence 


(2.2.5’) R(t) = E{x(s)-x(s + H} = RO)AW* 
(2.2.6’) RO) = Ef{ax(s + t)-a(s + t)} = AORO)A(H*. 


The matrix A(t) is uniquely determined by (2.2.5’) since R(O) is non-singular. 
It then follows from (1.6.1) that 


(2.2.7) A(s + t) = A(s)A(é). 
The continuity hypothesis (1.3.1) becomes 


(2.2.8) lim R(O)A()* = RO), 


t—0 


which implies that 


(2.2.9) ‘lim A(t) = I. 


t-—0 


It is well known that any solution of (2.2.7) and (2.2.9) can be written in the 
form A(t) = e’®. If now the right side of (2.2.6’) is expanded in powers of t 
and the coefficient of ¢ is set equal to 0, the resulting equation is (2.2.3’). It can 
be supposed, going to an equivalent process if necessary, that R(0) is the identity. 
Then A(t)A(é)* = J, Q + Q* = 0. An equivalent process can be chosen so 


























GAUSSIAN PROCESSES 243 i} 






that R(O) is still the identity, and so that Q is in the real canonical form of skew 
symmetric matrices: its elements vanish except for possible two rowed matrices 


0 6 i 
-—6 0 |: 


down the main diagonal. It is now clear that the non-degenerate process is a 
direct product of factors of type M(e‘*) corresponding to two rowed matrices 
just described, and factors of type M(1). 

Proof of (wz). If R(O) and A(Q) satisfy the conditions of Theorem 1.2 (iii), 
choose x(0) as any Gaussian variable with correlation matrix R(0). Then te 
define x(n) by (2.2.1) ((2.2.1’)). The resulting stochastic process is temporally | 














homogeneous if and only if E{x(s)-x(s + ¢)} depends only on t. The details of : 
the calculation will be carried out for only for ¢ integral. ti 
° In the first place 
1 : 
: (2.2.10) E{x(n)-x(n + v)} = E{A"x(0)-A"*’x(0)} = A"R(O)A*"””. 
° Now (2.2.3) can be developed further: 
(2.2.11) R(0) = AR@)A* = A*R)A* = 
{ so that (2.2.10) reduces to 
' (2.2.12) E{a(n)-a(n + v)} = R(O)A*’. 
The process is thus temporarally homogeneous, and obviously satisfies the other 
parts of the definition of a deterministic t.h.G.M. process. Theorem 2.2 is now 
completely proved. 
The restriction imposed on R(O), A(Q) by (2.2.3) ((2.2.3’)) is quite loose. 
- Given R(O), there is always an A(Q) satisfying (2.2.3) ((2.2.3’)) for example the 
identity (null matrix). Given an A which is the transform of an orthogonal 
matrix (a Q which is the transform of a skew symmetric matrix) there is always 
a corresponding R(0): In fact A(Q) can be assumed to be orthogonal (skew sym- 
metric) and the R(O) can be taken as the identity. 
3. T.H.G.M. processes with an integral valued parameter ti 
In this section, the parameter ¢ will range through all the integers. The condi- 4 
tion (1.5.3) that a t.h.G. process be a t.h.G.M. process can be simplified in the 
integral parameter case. In fact it will be shown that it is sufficient if ; 
. (3.1.1) E{ ---,a(n — 1), x(n); a(n + 1)} = Efa(n); x(n + 1)}, i 
e 4 
£ n=0,+1,::: il 
an | i } | i 
7. with probability 1. If (3.1.1) is true, i 








(3.1.2) x(v + 1) = Az(y) + n(r) 


244 J. L. DOOB 
where A is the transition matrix of the process and n(v) has mean 0 and is inde- 
pendent of ---, (vy — 1), x(v). It follows that 
a(n) ren A n “™2(m) 4- a* m *n(m) 4. _— m “n(m 4 1) 
poo eo | 


The terms involving the 7(j) are all independent of --- , 2(m — 1), x(m). This 
equation therefore implies that 


(3.1.3) 


(3.1.4) Ef ---,x2(m — 1), x(m); 2(n)} = A" "a(m) = Ef{ax(m); x(n)}, 


and (3.1.4) is precisely the condition that the process has the Markoff property. 
The following lemma will be useful. 
LEMMA 3.2. Let x:(a1,°++, ty) be any Gaussian chance variable, with E\x\ 
= 0. Then there is a uniquely determined symmetric non-negative definite matrix 
S, and a Gaussian chance variable y, such that 


(3.2.1) Efy-y} 
and 
(3.2.2) z= Sy, by 


If a = Sy, and it S is symmetric, then the second equation in (3.2.2) is cer- 
tainly true. It is easily seen, by examination of the characteristic values and 
vectors of the matrix H}a-x} that this matrix has a unique symmetric non- 
negative definite square root S. Hence if there is an S satisfying (3.2.2), there 
van be only one. The chance variables 2,,---, xy can be written as linear 
combinations of N uncorrelated Gaussian chance variables & , --- , £y satisfying 
E{é-¢} = 7: 


ts Ss) 
(3.2.3) x = Aé. 


If A is written in the polar form A = SU, where S is symmetric and non-nega- 
tive definite and U is orthogonal, (3.2.3) becomes 


(3.2.4) x Sy 


where y = UE satisfies (3.2.1). 

It will be shown below that every t.h.G.M. process can be represented as the 
direct product of factors of certain types. The deterministic types have already 
been catalogued: (0), M(1), M(e’’). The non-deterministic factor type 
(integral valued parameter) will now be described. 

M. Let {n(n)} be a sequence of mutually independent N-dimensional 
Gaussian chance variables with 0 means and a common distribution function. 
Let A be any N-dimensional square matrix. Define x(n) by 


(3.3.1) a(n) = > A™ n(n — m) 


m=() 








le- 


- 
ind 
on- 
ere 
ear 
ing 


the 
ady 


ype 


ynal 
ion. 














GAUSSIAN PROCESSES 245 


where it is supposed that’ A is so chosen that the series converges with prob- 
ability 1. This will be true, for example, if all the characteristic values of A 
have modulus less than 1, so that the terms of A” go exponentially to 0." It 
will be shown below that it is no restriction to assume that A has this character. 
The variables {a(n)} determine a t.h.G.M. process. Since x(n)-Ax(n — 1) 


is independent of --- , r(m — 2), a(n — 1), the x(n) process is a Markoff process 
with transition matrix 1: 
(3.3.2) E\ ---,a(m — 1); x(n)} = Aax(n — 1). 
A process defined in this way will be called a process of type M. A non-singular 
change of variables y(n) = Ba(n) leads to a process of the same type: 
3 

(3.3.3) y(n) = >. (BAB™)"Bn(n — m). 

m=0 


It will sometimes be convenient to write a process of tvpe M in a form slightly 
different from (3.3.1). Using Lemma 3.2 it is evident that there are Gaussian 
variables {&()} satisfying 


(3.3.4) E\e(n)} = 0, E\E(m)-E(n)} = Sninl, m,n = 0, +1,---, 


and a symmetric non-negative definite matrix S such that n(n) = Sé(n). Then 
S = Esé(n)-&(n)} and 


(3.3.5) a(n) = >> A™Sé(n — m). 
m=0 


Under the change of variable y(n) = Bzx(n), A becomes BAB™ and S* becomes 
BS'B*. 

The only condition on A required for convergence in (3.3.5) is that A” S — 0. 
It will now be shown that A can always be assumed to have only characteristic 
values of modulus less than 1, in the sense that there is an A with this property, 
and satisfying the equations 


(3.3.6) A "Ss — A“é. ni = i, 2, - 


It is no restriction, going to an equivalent process if necessary, to assume that 
the elements of A vanish except for those in two square submatrices down the 
main diagonal, where one submatrix A; has only characteristic values of modulus 
less than 1 and the other, As, of modulus greater than or equal to 1. If the 
matrix S is written in terms of the corresponding submatrices: 


(3.3.7) A= . ) S = (< 2 
0 Ag S, So 
1! Throughout this paper, if A is any matrix, A® is defined as the identity matrix J. 
12 We shall use repeatedly Kolmogoroff’s theorem that an infinite series of mutually inde- 
pendent chance variables with zero means converges with probability 1 if the series of their 
dispersion is convergent. (Kolmogoroff only states the theorem in one dimension, but the 


extension to n dimension is trivial.) If aseries of mutually independent Gaussian variables 
converges, the series of dispersions converges to the dispersion of the sum. 


























































































246 J. L. DOOB 


the condition on A implies that A3S, — 0. If it is shown that S. = 0, it will 
follow that S3 = Ss = 0, because S is symmetric and non-negative definite. The 
matrix A will then be defined by 


:— Aj 0 
(3.3.8) ‘o ( . 9) 


and A will satisfy (3.3.6) and will have only characteristic values of modulus 
less than 1. The problem has thus been reduced to the case where A, is absent: 
A only has characteristic values of modulus at least 1, and it must be proved 
that A”"S — 0 implies that S = 0. The proof of this is immediate when A is put 
into its Jordan canonical form. 

The symmetric non-negative definite matrix S satisfies the equations 


0 


(3.3.9) R(O) = x A" S°A*" = S? + AR(0)A*. 

THEOREM 3.4. A direct product of processes of type M is also of type M. Con- 
versely any factor process of a process of type M is itself of type M. 

The direct part of the theorem is obvious. To prove the converse, suppose 
an N-dimensional process of type M has an [-dimensional factor, corresponding 
to the variables x(n), --- , x(n). It can be supposed, that all factors of type 
M(0) are separated out, so that there are indices j, k:1 < 7 $< 1S k < N such 
that the {a(n), ---, x;(m)} and {az4i(m), --- , aw(n)} processes are non-de- 
generate and that the variables x;.:(n), --- , x:(m) vanish identically. Making 
a change of variables, if necessary, it can be supposed that R(G) has the form 
7 k-j N-k 
I 0 0 
0 0 0 
0 0 I 


and that R(n) has the blocks of zeros indicated in (3.4.1). Since R(1) = R(O)A*, 
A must have the form 


(3.4.1) R(O): 


(3.4.2) 


Then A" will have this same form. Finally, because of (3.3.9), S’, and therefore 
S must have the form 

— 0 0 
(3.4.3) S:10 0 0 

0 0 — 
Let Ao be the matrix whose elements are the same as those of A except that the 
(j + 1)th to kth columns of Ap vanish. Since 


(3.4.4) AoS = AS, m =0,1,--- 








\e 


tT —_= © US & 


JS 


o 












GAUSSIAN PROCESSES 


it follows that 


(3.4.5) a(n) = >> At Sé(n — m). 
m=0 





It is now obvious that the {2:(n), , «1(n)} process is of type M. 
THEOREM 3.5. A non-degenerate process of type M has no deterministic factors. 
Any factor process is non-degenerate and of tvpe M. To prove the theorem, 

it will therefore be sufficient to prove that the process itself cannot be deter- 

ministic. If it were, we should have 















x(n) — Ax(n — 1) = S&(n — 1) = 


Then S = 0. But then the process is certainly degenerate, contrary to hy- 
pothesis. 

THEOREM 3.6. (i) Every t.h.G.M. process (discrete parameter) is the direct 
product of processes of type M(0), M(+1), M(e’*), M. 

(ii) Let A be a transition matrix of a t.h.G.M. process, with variables {x(n)}. 
There are mutually independent Gaussian variables --- , &(0), &(1), --- , & satis- 
fyong 






Efé(n)} 
E}é(n) = Kié-t} = 7, 


and symmetric non-negative definite matrices S, T such that 


(3.6.1) 






-E(n)} 








>, A” Sé(n — m) + A” TE, n=0,1,-::, 


m=0 


(3.6.2) x(n) 







(3.6.3) T” = AT’ A*, 






(3.6.4) R(0) 


> A"™S°A*™" + 7? = AR(O)A* + 8’, 
m=0 










where the series in (3.6.2) converges with probability 1. If A is non-singular, (3.6.2) 
holds for alln. The sum and last term in (3.6.2) are linear transformations of x(n): 
(3.6.2) exhibits in part the decomposition into factor processes described in (i). The 
correlation function is given by 

R(n) = R(O)A*" 


(3.6.5) iwhhs 
R(—n) = A"R(O). 













(iii) The transition matrix A is uniquely determined if and only if the process 
is non-degenerate. In any case, there is a transition matrix whose characteristic 
values are all of modulus less than or equal to 1, and whose characteristic values of 
modulus 1 correspond to simple elementary divisors. The transition matrix A 
furnishes the solution of the prediction problem of the process: 





(3.6.6) E{ --+,a(m — 1), x(m); x(m + n)} = A”x(m), n= 0,1,--- 


248 J. L. DOOB 


The matriz S’, which is uniquely determined, measures the dispersion of x(n + 1) 
about its predicted value in terms of x(n): 


(3.6.7) E\{x(n + 1) — Aa(n)P} = S’. 


(iv) Conversely if A is a matrix with at least one characteristic value of modulus 
less than 1 or of modulus 1 and corresponding to a simple elementary divisor, A is the 
transition matrix of a t.h.G.M. process, with R(Q) not the null matrix. If all the 
characteristic values of A are as just described, A is the transition matrix of a non- 
degenerate t.h.G.M. process. If R(O), S, A are matrices satisfying (8.6.4) with 
R(O), S, symmetric and non-negative definite, there is a t.h.G.M. process whose 
variables can be written in the form (3.6.2) with the given R(O), S, A. 

This decomposition of a t.h.G.M. process into deterministic factors can be 
considered as a special case of the general decomposition theorem of Wold, which 
is applicable to all t.h.G. process.” (Wold only considered the one-dimensional 
case.) The proof in the present special case is simpler, however, and illumi- 
nates the general case. 

Proof of (z) and (iz). Equations (1.5.3) and (1.5.4), in the present case, 
lead to 


(3.6.8) E{---,a2(m — 1), x(n); a(n + 1)} = Efa(n); x(n + 1)} = Az(n). 


The first two terms are cqual because the process has the Markoff property. 
The last term is linear in x(n) because the process is Gaussian. The matrix A 
can be taken independent of n because the process is temporally homogeneous. 
Thus (3.6.8) involves the three fundamental properties of the x(n) process. 
From the definition of conditional expectation, it follows that x(n + 1) — Az(n) 
is independent of the chance variables --- , x(n — 1), x(n). Hence the variables 


--+,[x(n) — Aax(n — 1)], [x(n + 1) — Az(n)], --- 


are mutually independent. According to Lemma 3.2, there are mutually inde- 
pendent chance variables {é(n)} satisfying 


(3.6.9) x(n) — Aa(n — 1) = Sé(n), E{e(n)-£(n)} = 1, E{e(n)} = 0, 

where S is symmetric, non-negative definite, and satisfies (3.6.7). The matrix 

S’ thus measures the dispersion of x(n) about its predicted value Aa(n — 1). 
The representation (3.6.2) can be obtained very simply. In fact 


a(n) = [x(n) — Ax(n — 1)] + Alz(n — 1) — Aa(n — 2)] + --- 
suet a P ee 
(3.6.10) ; +A [x(v + 1) — Ax(v)] + A” 2x(r) 


n—v—1 
= 2) A’Se(n — j) + A” 20), 
j=0 
13 4 Study in the Analysis of Stationery Time Series, Uppsala (1938), p. 89. See also 
Kolmogoroff, Bull. Acad. Sci. URSS Ser. Math., Vol. 5 (1941), pp. 3-14 and Bolletin Mos- 
kovskogo Gosudarstvenogo, Mathematika, Vol. 2 (1941), pp. 1-40), in whose papers the de- 
composition theorem is brought out in its full significance. 





GAUSSIAN PROCESSES 249 


and it will be shown that when vy > — ~ (3.6.10) leads to (3.6.2). Before going 
to’the limit, however, we note that in (3.6.10) the sum is independent of the 
variables --- , x(v —1), x(v), so that 


(3.6.11) E{ ---,2(v —1), x(v); 2(n)} = A” "x(v) 
which is another way of writing (3.6.6). Moreover, using (3.6.11), 
(3.6.12) R(n — v) = E{x(v)-x(n)} = R(O)A*"”, 


which is another way of writing (3.6.5). (The value of R(n) for n < 0 is ob- 
tained using the fact that R(—n) = R(n)*.) The last term in (3.6.10) is the 
conditional expectation of x(n) for preassigned --- , x(v —1), x(v). It follows 
from the corollary to Theorem 1.2 that this conditional expectation converges 
with probability 1 when » — — ~, but this convergence will be proved directly 
in the present particular case. 

From (3.6.10), 


n—v—l1 


(3.6.18) E{ax(n)-x(n)} = RO) = >> A’S?A*® + A”™”R(O)A*"”. 

j=0 
The terms of the sum and the last term are all symmetric and non-negative 
definite matrices. It follows that there is convergence in (3.6.13) when v — 
—o: 


co 


(3.6.14) R(0) = >> A’?S’A* + lim A” R(0)A*". 

j7=0 m—>0o 
The convergence of the series of dispersions in (3.6.14) implies that the series of 
chance variables in (3.6.2) converges, with probability 1. Then when v > — 
(3.6.10) becomes 


(3.6.15) a(n) = >> A’ Sé(n — j) + a(n), 
j7=0 


where 


(3.6.16) z(n) = lim A” x(y). 

poo—gQ 
Since x(n) is independent of (nm + 1), &(n + 2), --- , z(n) is independent of every 
i(m). Moreover, writing z(0) = Té, where é satisfies (3.6.1) and T is symmetric 
and non-negative definite, 


(3.6.17) 2a(n) = A"2(0) = A”T¢E n= 0. 


s) 


Thus (3.6.3) and (3.6.4) are satisfied. If A is non-singular, (3.6.17) will be cor- 
rect for negative n also. 

The decomposition of the process into factor processes of the types described 
in the theorem will be obtained by a detailed analysis of the significance of 
(3.6.2). Under the change of variable y(n) = Bx(n), T° becomes BT’B*, and 





250 J. L. DOOB 


A becomes BAB™. Making a suitable change of variables, if necessary, it can 
be supposed that A has the form 


(A 0 
(3.6.18) a:(4 > 


where the characteristic values of A; have moduli unequal to 1 and those of A, 
have modulus 1. The matrix 7” can be written in terms of submatrices of the 
same dimensions in a corresponding way: 


\ ry Ti ar 
( . ‘ 
a0. r(™ 3) 


where 7,, 72 are symmetric and non-negative definite. A further change of 
variables may be made, if necessary (transforming only the last n variables) pre- 
serving the forms (3.6.18) and (3.6.19) and transforming 7. into the identity. 
Then using (3.6.3) 


(3.6.20) AiT?A1 = T2, AsAs =I. 


*m 


Hence A, is orthogonal. Developing (3.6.20) further, A?7T{Ai” = 7%, for all 
m2=Q. When m -— ~ in this equation, the terms in the matrix product on the 
left involve the mth power of the characteristic values of A, (all of modulus 
different from 1, by hypothesis). Then those characteristic values which actually 
appear can only be those of modulus less than 1, and the matrix on the left must 
goto0Oasm— «:T7; = 0. Since T is non-negative definite, 7 must have the 
simple form 


m. (0 0 
(3.6.21) 7 c 7 : 


The matrix S can also be divided into corresponding submatrices: 


: , {8s 
(3.6.22) S: ‘s ) 


The convergence of the series in (3.6.4) implies that 


° y2 * 
lim As Ss a . aS 0. 


m-?>oo 


Since A, is orthogonal, this means that S. = 0, and since S is symmetric and non- 
negative definite, S has the form 


> = ’ Si 0 


It is now clear from (3.6.2) that the x(n) process is the direct product of a process 
of type M and a deterministic process corresponding to the division of the vari- 
ables determining the above submatrices. The deterministic factor process is 
the direct product of the elementary types already discussed. The variable 
z(n) and the sum in (3.6.2) are linear transformations of x(n). 





GAUSSIAN PROCESSES 251 


Proof of (wiz). If the process is non-degenerate, R(0) is non-singular, and the 
transition matrix is determined uniquely by (3.6.5) with n = 1. If the process 
is degenerate, there will be one or more factor processes of type M(0), and their 
transition matrices are quite unrestricted. In the non-degenerate case the 
characteristic values will be of modulus less than 1 (corresponding to a factor of 
type M, if one is present), or equal to 1 (corresponding to the factors of type M 
(+1), M (c’*) making up the deterministic factor, if one is present), and in the 
latter case the elementary divisors are simple. If the process is degenerate, and 
if the part of A corresponding to the factors of type (0) is taken to be the iden- 
tity, there will be simple elementary divisors corresponding to the characteristic 
value 1 for each such factor. The remaining statements of (iii) have already 
been proved. 

Proof of (w). Let A be a matrix with at least one characteristic value of 
modulus less than 1 or equal to 1 and corresponding to a simple elementary 
divisor. ‘Then some transform BAB™ has the form 


A, 0 0 
(3.6.24) 0 A, 0 
0 0 As 


where A, (ii present) has only characteristic values of modulus less than 1, Az 
(if present) is orthogonal, and both A,, As are not absent. For the purposes 
of the present proof it can be supposed that A is already in this form. Define 
S, T by 


S:; 0 0 0 0 @ 
(3.6.25) S:1[0 0 0 T:{0 J O 
0 0 O 0 


where the indicated submatrices of S and T are in the same positions as those of 
A, and where S; is any symmetric positive definite (and therefore non-singular) 
matrix of the proper dimension. The series in (3.6.4) converges and the first 
equation in (3.6.4) defines a matrix R(0) which obviously satisfies the continued 
equality. If all the characteristic values of A are as described in the beginning 
of this paragraph, A; can be supposed absent. In this case 


RO)=S +7 +--- 


is non-singular. The proof of the first two parts of (iv) has now been reduced 
to that of the last part. Suppose then that R(O), A, S satisfy the hypotheses 
of the last part of (iv). Then 


RO) = AR(O)A* + S’ 
(3.6.26) AR(O)A* = A°R(O)A* + AS’A* 


A™*R(0)A*" = A"R(O)A*" + A™'S°A*™", 











252 J. L. DOOB 


Adding these equations 


n—1 


(3.6.27) R(O) = >, A™S’A*”" + A” R(0)A*". 
m=0 

This equation leads to (3.6.14), and 7”, defined as the limit in (3.6.14), satisfies 
(3.6.3). Let --- , &—1), &(0), --- , & be mutually independent Gaussian vari- 
ables satisfying (3.6.1). Then the x(n) defined by (3.6.2) determine the vari- 
ables of a Gaussian process with non-negative values of n, but a slight modifica- 
tion is needed to obtain an expression defined for all n. To obtain this, it can 
be supposed that A, 7, S are in the forms (3.6.24), (3.6.25). Define A by 


I 0 0O 
(3.6.28) A:{0 As 0 
0 0 TL 


Then A is orthogonal and AT = AT. If now (3.6.2) is used to define x(n) for 
all n with A"T replaced by A”T, the x(n) process is a t.h.G.M. process with the 
desired properties. 

The properties of the process reversed in time are of some interest. It is easy 
to see that if n is replaced by —n, a t.h.G.M. process remains a t.h.G.M. process, 
If the original process is non-degenerate, the new transition matrix is 
R(O)A*R(O)*. If the transition matrix remains unchanged when n is replaced 
by —n, R(O)A*R(O)' = A. This is equivalent to the equation R(n) = R(—n). 

The simplest generalization of a t.h.G.M. process is the following. Let the 


chance variables {y(n)} determine a t.h.G. process with the property that for 


~ 


some N > OQ, 
(3.7.1) Ef ---, ym — 1); ym)} = Ely(n — N),---, y(n — 1); y(n)}, 


with probability 1. If N = 1, the process is a t.h.G.M. process. This type 
process will be called a t-h.G.My. process. ‘To avoid notational complications 
only the one-dimensional case will be considered. The right hand side of (3.7.1) 
is a linear combination of the variables y(n — N),--- ,y(m — 1). The variables 
thus satisfy a difference equation of the form 

(3.7.2) y(n) — ay(n — 1) — +--+ — ayy(n — N) = n(n) 

generalizing (3.6.9), where n(n) is independent of the chance variables --- , 
y(n — 2), y(n — 1). The {n(n)} are mutually independent chance variables 
with zero means and dispersions independent of n. Equation (3.7.2) leads to 


(n—m (n—m) 


(3.7.3) y(n) — ay” ™y(m — 1) — +++ — ay’ y(m — N) = 9'™”"(n) 

(m < n) 
where 7" ™ (n) has zero mean and is independent of the chance variables - -- 
y(m — 2), y(m — 1). Hence 


(3.7.4) E{---,y(m — 1); y(n)} = Efy(m — N), --- , yim — 1); y(n}, 


IA 


m 





pe 
ns 


les 





GAUSSIAN PROCESSES 253 


The difference equation (3.7.2) has been studied in some detail in the past.” 
We shall use an approach which adds insight into the structure of the solution 
and which clarifies the place of the solution in the general theory of t.h.G. 
processes. This approach is in terms of N-dimensional t.h.G.M. processes. 
Define the variables {x(n)} of an N-dimensional process by 


(3.7.5) x(n) = y(n+ 9), n=0,+1,---, j=1,---,N. 


The x(n) process is evidently a t.h.G.M. process. If the index N of the y(n) 
process is the minimum for which (3.7.1) is true, the corresponding x(n) process 
will be non-degenerate. Then the transition matrix A is uniquely determined, 
and is evidently 


=e tes Ss ee eS 
> ei e-« «| 
(3.7.6) aa « + & «* SR a, ~ 0. 
0 0 1 
an - ay 
The matrix S, measuring the dispersion of the prediction Ax(n — 1) of x(n), 
has the form 
lien + #22 & « @ 
(3.7 a) S: 0 - : : : ; . 0 
ey « * =» & Se 
The characteristic equation of A is simply 
(3.7.8) av — aay" — .-- —ay = 0. 
The matrix A has only a single characteristic vector corresponding to each char- 
acteristic value d, the vector (1, 4, --- , 4”). Hence if \ is a multiple root of 


(3.7.8), it does not correspond to a simple elementary divisor. Therefore, ac- 
cording to Theorem 3.6, all roots of (3.7.8) of modulus 1 must be simple roots. 
It will be proved below that either no roots have modulus 1 or all roots have 
modulus 1. 

If an N-dimensional non-degenerate t.h.G.M. process is given whose transition 
matrix A and dispersion matrix S have the forms (3.7.6) and (3.7.7) respectively, 


x(n) — tjya(n —- 1) = 0 


with probability 1, for 7 < N. Then a y(n) process can be defined unambigu- 
ously by (3.7.5). Since for fixed j, x;(n) determines a one-dimensional t.h.G. 
process, the y(n) process is a t.h.G. process, and (3.7.1) is obviously true, with 
N minimal if A is non-singular. 

Case 1. S = O (deterministic case). In this case the x(n) process is deter- 
ministic, and the y(n) process satisfies the equation 


(3.7.2’) y(n) = ay(n — 1) +--+ + ayy(n — QN). 


4 Cf. for example H. Wold, A Study in the Analysis Of Stationery Time Series, Uppsala, 
1988. 


254 J. L. DOOB 


All the roots of (3.7.8) are simple roots, of modulus 1. Since S = 
(3.7.9) a(n) = A”TE n 


and therefore 


(3.7.10) y(n) = a(n —1) = D(A" T)yé& =n = 0, £1 =: 
j=1 


Using either the well known form of the solution of the Nth order difference 
equation (3.7.2’) or of the powers of an orthogonal matrix, it follows that 


N 
(3.7.11) y(n) = >> (n; cos nd; + £; sin n9;) 
7=1 


where the n; and ¢; are (one-dimensional) Gaussian variables, and 
(cos 6; + 7 sin 6;} 


are the N distinct characteristic values of A, that is the roots of (3.7.8). 

Case 2. S # 0 (non-deterministic case). In this case it will now be shown that 
the x(n) process can have no deterministic factors: that is that the roots of 
(3.7.8) all have modulus less than 1. In fact let 8 be a root of (3.7.8), corre- 
sponding to the characteristic vector z of A*: 


1 


oe ‘ae (aye™ 1 aye™ 2 4 ay _18™ ee ee See a3 1) 
e ne ) 
= (ay8*’, oi , 8”). 


Then using (3.6.4), 
(R(O)z, z) = (AR(O)A*z, z) + (S’z, z) 
(3.7.13) (R(O)A*z, A*z) + (Sz, Sz) 
|B P(RO)z, z) + |B)". 


Hence | 8 | cannot be 1, and the x(n) process can have no deterministic factors. 
Equation (3.6.2) becomes 


(3.7.14) a(n) = . A" S&(n — m) 
m=0 


which leads to 


2 


(3.7.15) y(n) = > 


j=1 m= 


, (A™S)1;&(n —m-—l1)=s a (A™)wév(n — m — 1). 


According to Theorem 3.6 the only restriction on the coefficients a, , --- , ay 
in the two cases S = 0, S + 0, are respectively that equation (3.7.8) has N 
distinct roots of modulus 1 and all roots of modulus less than 1. Hence (3.7.10) 
and (3.7.15) furnish (with the stated restrictions on A) the most general 
t.h.G.My. processes. 





GAUSSIAN PROCESSES 255 


It was shown in Theorem 3.6 that if R(n) is the correlation function of a 
t.h.G.M. process, (nm) can be expressed in the form (3.6.5), where A is some 
suitably chosen matrix. Conversely if the correlation function of a t.h.G. 
process has the form (3.6.5), the process is a t.h.G.M. process since x(n + 1) — 
Az(n) is then orthogonal to (and therefore independent of) the variables 
-++, a(n — 1), a(n). (This fact implies the truth of (3.1.1)). The character- 
ization of t.h.G.M. processes in terms of their correlation functions is thus 
easily solved. The following theorems characterize one-dimensional t.h.G.My. 
processes from various points of view. It will be convenient, and also intrin- 
sically interesting to treat at the same time a slightly larger class of processes: 
the class of component processes of t.h.G.M. processes. A one-dimensional t.h.G. 
process with variables {x,(”)} will be called a component process of an N-dimen- 
sional t.h.G.M. process if there are N — 1 t.h.G. processes with variables 
{ae(n)}, --- , {aw(n)} such that the N-dimensional process with variables 
{ay(m), --- , tw(m)} is a t.h.G.M. process. If the variables {x(n)} determine 
an N-dimensional t.h.G.M. process, the t.h.G. processes determined by 
fai(n)}, --- , {aw(n)} will be called its N component processes. If an x(n) 
process is not of type 1/(0) and is a component process of an N-dimensional 
t.h.G.M. process, it is a component process of a non-degenerate N,-dimensional 
t.h.G.M. process, for some N; < N. It has already been seen that one-dimen- 
sional t.h.G.My. processes are component processes of N-dimensional t.h.G.M. 
processes. 

THEOREM 3.8. Let --- , x(0), x(1), --- be the variables of a one-dimensional 
t.h.G. process. The process is a component process of an N-dimensional t.h.G.M. 
process if and only if the chance variables 


(3.8.1) x(0), Ei at a x(—1), «(0); x(n)}, 


are linearly dependent on the first N. 

Suppose that the 2(n) process is a component process of an N-dimensional 
y(n) process: x(n) = yi(n), with correlation function R,(n) and transition matrix 
A. Since A satisfies its characteristic equation 


(3.8.2) det |al — A| = a” — aa” * — --- — ay = 0, 

it follows from (3.6.2) that if n(n + N) is defined by 

(3.8.3) y(n + N) — ay(n + N — 1) — --- — ayy(n) = n(n + N) 
then n(n + N) is independent of --- , y(n — 1), y(n). Then 

(3.8.4) a(n+ N) — aa(n + N — 1) —--- — aya(n) = m(n + N) 


where 7;(n + N) is independent of the chance variables --- , x(n — 1), x(n). 
Equation (3.8.4) leads to 


(3.8.5) 2(n+N-+) —al?2(n+ N —-1) — --- — af?z(n) = of? (n+ N +4 2) 





256 J. L. DOOB 
where m” (n + N + v) has zero mean and is independent of the chance variables 
-++, a(n — 1), x(n). If the operator 
E{---,a(m — 1), x(n); -} 
is applied to this equation, the result is 
E{---,x(n — 1), x(n); x(n + N + v)} 


(3.8.6) . | 
— ¥ a E{--- , 2(n — 1), a(n); x(n + N — m)} =0. 
m=1 


The last term in the sum is x(n) and (3.8.6) is thus the desired linear relation. 
Conversely suppose that the (V + 1)th chance variable in (3.8.1) is linearly 
dependent on the first NV: 
E{---,a(m — 1), x(n); 2(n+ N)} 
(3.8.7) V . 
a } Om Ht---, x(n — 1)a(n);7(n+ N —m)}, n=0,4+1,---. 


m=1 
Define the variables y(n), --- , yw() of an N-dimensional t.h.G. process by 
y(n) 


(3.8.8) y(n) = Ei---, a(n — 1), x(n); a(n + 1) 


Then 


(3.8.9) --+, y(n — 1), y(n); yv_i(n + 1)} yn(n) 


-, y(n — 1), y(n); yw(n + 1)} 
= E{---,x2(n — 1), e(n);2(n+ N)} = > Om Yn+1—m(N). 
m=1 


The y(n) process is therefore a t.h.G.M. process, with transition matrix (3.7.6), 
and the x(n) process is a component process. 

The following particular type of t.h.G. process will be involved in the proof 
of Theorem 3.9. If the chance variables {n(n)} determine a t.h.G. process 
whose correlation function R,(n) vanishes when n = N, then according to 
(1.3.5) the complex spectral function G,(A) of the n(n) process is continuous, 
with derivative Gi(d) given by 


(3.9.1) 0) = ‘ Ry (n). 





GAUSSIAN PROCESSES 257 


root of the equation 


N—1 
(3.9.2) > R,(n)2" = 0, 

n=—(N—1) 
then @, 1/a, 1/& are also roots, of the same multiplicity. Moreover if |a| = 
1, a is a root of even multiplicity, since the sum in (3.9.2) is real and non-negative 
when |z| = 1. When |z| = 1 


' ? 


(3.9.3) lal|( — a)(e — 1/a)| = Jz - af’. 
Hence G,(A) can be written in the following simple form: 
(3.9.4) G(d) ia we + ™ 4. em of fica ? 


where the roots of the indicated polynomial have modulus at most 1, and the 
coefficients are real. 

THEOREM 3.9. Let --- , x(0), x(1), --- be the variables of a one-dimensional 
t.h.G. process. The process is a component process of a finite-dimensional t.h.G.M. 
process if and only if the complex spectral function is the sum of the integral of 
the square of the absolute value of a rational function of e with real coefficients, 
and of a monotone non-decreasing function increasing only in a finite number of 
jumps.” Specifically: 

(i) The process is a component process of an N-dimensional t.h.G.M. process 
if and only if the complex spectral function has the form 


A t(N—1)X : \2 se 
(3.9.5) toy = [Et ee ie 
r= |aoe + °°: -ay| 


where 


aw 


(a) G(A) ts a@ monotone non-decreasing function satisfying (1.3.3), increasing 
only by jumps, at no more than N points; 

(b) the denominator of the integrand vanishes at every discontinuity of G(A), 
and the numerator vanishes at every zero of the denominator, to at least the 
same order; 

(c) the coefficients a9, -+-- , an, Bo, °** , By-1 are real, ao ~ 0, By ¥ O unless 
the integrand vanishes identically, and the roots of the polynomials in the 
integrand have modulus less than or equal to 1. 

The integral vanishes identically if and only tf the x(n) process is a component process 
of an N-dimensional deterministic process, and Gr) vanishes identically if and only 
if the variables x(n) vanish identically or the x(n) process is a component process of 
an N-dimensional t.h.G.M. process with no deterministic factor. 

(ii) The process is a t.h.G.My. process (deterministic case) if and only if the 


6 It is easily seen that the first term of the two can also be described simply as the in- 
tegral of a rational function of e*, which is non-negative for \ real and is an even function 
of \ like all complex spectral density functions. 





258 J. L. DOOB 


complex spectral function G(X) = G(A) is a monotone non-decreasing function satis- 
fying (1.3.3) increasing only in jumps, at no more than N points; (non-deterministic 
case) if and only if the complex spectral function has the form 


? dr 
(3.9.6) G(A) = i lae™ +... dave 
where a,--:, an are real and a ~ 0. 

Proof of (7). If the x(n) process is a one-dimensional component of an N- 
dimensional t.h.G.M. process, it has already been seen that for properly chosen 
real numbers a, --- , dw, (3.8.4) is true, where 7,(n + N) is independent of 
the chance variables ---, x(n — 1), x(n). Equation (3.8.2) can be assumed to 
have all its roots of modulus less than or equal to 1. It follows from (3.8.4) 
that m(n) is independent of m(m) if |m — m| 2 N. The complex spectral 
function of the m() process is therefore continuous, with derivative given by 
(3.9.4). It will be no restriction to assume that b) # 0 unless the derivative 
vanishes identically. According to (1.3.6), if G(A) is the complex spectral fune- 
tion of the x(n) process, 


E{m(O)m(n)} = [ e®*| bye! 4 +++ tox Pad 
(3.9.7) : 


rT 
fod NX 1(N—I)A : 
[ enti eh — ae’ — +--+ —ay| dG(a). 
Lr 


Hence if G(A) is the jump function of G(A) (G(—7) = 0, and Ga) is constant 
except for jumps at the same points as those of G(A), and of the same magnitude), 
d 
| wer |. hee 4. es ? ON 
(3.9.8) 


=[ \e™—- +++ -ayP dea) — Ga) 


r 
+ [ |e —--- — ax dda. 


Since the first two integrals are continuous in X, the last must be continuous also. 
Hence the last integrand must vanish ateevery discontinuity of Ga), that is at 
every discontinuity of G(A), and the last integral vanishes iuentically. It 
follows that 


A | a(N—LA 2 
(3.9.9) Ga) — Ga = [ lili Ala 2 oes! dy 

+ é — 0s ae Bel 
where the numerator vanishes at each zero of the denominator, with the same or 
greater multiplicity. Since the denominator vanishes at each discontinuity 
of G(A), there can be at most N discontinuities. If the N-dimensional process 
is a deterministic process, it can be assumed that all the roots of equation (3.8.2) 





GAUSSIAN PROCESSES 259 


have modulus 1, that is that the denominator and hence also the numerator in 
(3.9.9) have N roots. This can be true only if the numerator vanishes identi- 
cally: G(A) = Gia). If the N-dimensional process has no deterministic factor, it 
can be assumed that all the roots of equation (3.8.2) have modulus less than 1. 
Then G(A) can have no discontinuities: G(A) = 0. 

Conversely if G(A) has the form described in Theorem 3.9 (i), G(A) can be 
assumed in the form (3.9.9) with real coefficients in numerator and denominator 
and the stated relations between the jumps of G(A) and the zeros of the numerator 
and denominator in the integrand. (If the integrand vanishes identically and 
if G(A) has N discontinuities, a; , --- , ay can be chosen as those numbers making 
the polynomial 


iN a(N—1)A 
é — aye ead 


vanish at the discontinuities of G(A).) Then 
R(n+N) —aR(n+N —1) —-:- — avR(n) 


‘i | inns [bo H+ bye bye + ++ + bead ay 


1 — aye® — .-- — ayei® 


(3.9.10) 


+ [ eo [ehM* — -.. — ay] dG(r). 


The last integral vanishes since the bracket vanishes at every jump of G(a). 
The denominator in the first integrand is the value on | z | = 1 of a polynomial 
all of whose roots are outside |z| = 1, oron|z| = 1. Any zeroon|z| = 1 
corresponds to one of the numerator at the same point. The integral therefore 
vanishes if m = 0 (Cauchy Integral Theorem): 


(3.9.11) R(n +N) —aR(n+ N — 1) —--- — ayR(n) = 0, n= 0. 
This equation implies that 
(3.9.12) a(n + N) — aqa(n + N — 1) — --: — aya(n) 


is independent of the chance variables --- , z(n — 1), x(n), that is that (3.8.4) 
is true, where 7;(n + N) has the stated properties. It has already been seen 
in the proof of Theorem 3.8 that this implies (3.8.6) and that this in turn implies 
that the process is a component process of an N-dimensional t.h.G.M. process 
whose transition matrix A has characteristic equation (3.8.2). In particular if 
G(\) = GA), the roots of the characteristic equation are of modulus 1, so that 
the N-dimensional process must be deterministic. If Ga) = 0, the x(n) process 
is a component process of an N-dimensional process whose transition matrix A 
has only characteristic values of modulus less than 1. This N-dimensional 
process can have no deterministic factors other than one or more of type M(0). 
If these exist, (and if the x(n) process is not of type M(0)) they can be replaced 
by non-degenerate factors of type M, to obtain an N-dimensional process with 
no deterministic factor, having the x(n) process as a component process. 


Ee ea Sa 





260 J. L. DOOB 


Proof of (a). If the x(n) process is a t.h.G.My. process, (3.8.4) is true with 
m(m) independent of ;(n) ifm # n. The discussion in (i) is therefore simplified 
by the fact that the numerator in (3.9.9) is constant. If this constant is 0 


’ 


the spectral function is a function of jumps: G(A) = G(r). If this constant is 
not 0, the denominator in (3.9.9) does not vanish, and G(r) therefore vanishes 
identically. The converse is proved as in (i). 

THEOREM 3.10. (i) /f ai, --- , ay are real numbers, there is a one-dimensional 
t.h.G. process not of type M(0) with correlation function R(n) satisfying 


(3.10.1) R(n+ N) —aR(n+ N —1) — --- — ayR(n) = 0 
for n 2 O7f and only if the equation 
(3.10.2) a” — aa" —---— a = 0 


has at least one root of modulus less than or equal to 1. 

Let --- , x(O), x(1), --- be the variables of a one-dimensional t.h.G. process 
not of type M(0). 

(ii) This process is a component process of an N-dimensional t.h.G.M. process if 
and only if the correlation function R(n) satisfies an Nth order linear difference 
equation (3.10.1) for n = 0. 

(iii) The process is a t.h.G.My. process if and only if the difference equation 
(3.10.1) zs true for n 2 —(N — 1). In this case the vectors {x(n),---, 
a(n + N — 1)} determine an N-dimensional t.h.G.M. process. 

(iv) Equation (3.10.1) 2s satisfied forn = —WN if and only if 


(3.10.3) az(n+ N) — av(n+N —1) —--- — ayr(n) = 0, n= 0, 41,::-. 


Proof of (ti), (wi), (v). Let the x(n) process be a component process of an 
N-dimensional t.h.G.M. y(n) process with correlation function R,(n): x(n) = 
yi(n), and transition matrix A. Since A satisfies its characteristic equation 
(3.8.2), it follows from (3.6.5) that 


(3.10.4) R,(n+ N) — ak,(n + N — 1) —-:- — ayR,(n) = 0, 


Then R(n) = (R,(n))y satisfies this same difference equation. Conversely if 
(3.10.1) is true for n = 0, it has already been proved in the course of the proof of 
Theorem 3.9 that the x(n) process is a component process of an N-dimensional 
t.h.G.M. process. This finishes the proof of (ii). Parts (iii) and (iv) are proved 
similarly. 

Proof of (i). According to (ii), if there is a one-dimensional t.h.G. process 
whose correlation function R(n) satisfies (3.10.1) for n = 0, the process is a com- 
ponent process of an N-dimensional t.h.G.M. process whose transition matrix 
A has (3.10.2) as characteristic equation. Since A has at least one characteristic 
value of modulus less than or equal to 1, (unless the x(n) process is of type 
M(0)), (3.10.2) must have at least one root of modulus less than or equal to 1. 
Conversely if (3.10.2) has at least one such root, there is a real N-dimensional 
matrix A whose characteristic equation is (3.10.2), and which has simple ele- 





GAUSSIAN PROCESSES 261 


mentary divisors. According to Theorem 3.6 (ii), A is then the transition matrix 
of some t.h.G.M. process. The correlation function of this process and hence 
that of each component process satisfies (3.10.1) for n 2 0. 


THEOREM 3.11. (i) [fa,,-+-- , ay are real numbers, there is a one-dimensional 
th.G. process not of type M(0) satisfying 


(3.11.1) x(n + N) — aa(n + N — 1) — -+: — aya(n) = n(n +N), 


n= 0, +1, 
with n(m), n(n) independent for |m — n| = N if and only if (3.10.2) has at least 
one root of modulus less than or equal to 1. 

Let --- , x(Q), x(1), --- be the variables of a one-dimensional t.h.G. process. 

(ii) This process 1s a component process of an N-dimensional t.h.G.M. process 
if and only if (311.1) 28 true with n(m), n(n) independent for |m — n| = N. 
In this case n(n + N) will be independent of the chance variables --- , x(n — 1), 
a(n). 

(iii) The process is a t.h.G.M. process if and only if in addition to the condition 
in (tt), n(n) 2s independent of the chance variables ---, x(n — 2), x(n — 1): de- 
terministic case if n(n) = O with probability 1, nondeterministic case otherwise. 

Since this theorem follows readily from the preceding theorems, the proof will 
be omitted. 

The problem of predicting x(n) in terms of --- , a(n — 2), a(n — 1) is trivial 
(theoretically at least) for t.-h.G.My. processes. In fact these were defined as 
those processes for which the solution of the prediction problem is simply a 


linear combination > a;x(n — j) of the preceding N variables. The solution 
j7=1 


will now be given for the more general class of component processes of N-dimen- 
sional t.h.G.M. processes, processes which have been described from several 
points of view in the preceding theorems. 

The prediction problem for component processes of N-dimensional t.h.G.M. 
processes will be put into a more general setting. If the one-dimensional chance 
variables {x(n)} determine a t.h.G. process, with Ea function R(n), 
the problem of finding F{--- , e(n — 2), x(n — 1); x(n)} is that of finding a 
series > Yn (n — m)'® such that 


m=1 


(3.12.1) a(n) — = Y¥mnx(n — m) 


is uncorrelated with every x(n — v) (v > 0): 


R(v) — >> ymR(v —m) =0, »>O. 
m=1 


16 We are neglecting all convergence difficulties. They become trivial for the applica- 
tions to be made below. 





262 J. L. DOOB 


If the complex spectral function is G(A), (3.12.2) becomes 


(3.12.3) [ ae -> me} dG(1) = 0, v=0. 
T m=1 


Let G(A) be the integral of its derivative G’(A), that is let G(A) be absolutely con- 
tinuous. According to (3.12.3) the problem reduces to that of finding a function 


(3.12.4) f(z) =1- = -3 —-- game 


such that f(z)G’ is of power series type, a function corresponding to an expansion 
in non-negative powers of z. The dispersion of the error of the prediction is 


B {| x(n ~ - Ynx(n — m |} = [ , ¥ gir | dG(x) 


= [ |f dG). 


(3.12.5) 


In particular if the x(n) process is a component process of an N-dimensional 
t.h.G.M. process, G(A) is given by (3.9.5). It will be supposed throughout the 
following that G = 0. Then 


a N-l) 
3.12.6) G@(n) = 2So% FF Bw-s)(Bo + +++ F Bwz) 
( ) ) (a1z¥ + +--+ + aw) (ao + --- + ay2) 


In this case f = 1 if G’ = 0, and otherwise f is given by 


(3.12.7) f(z) = —Polao2! + +++ + aw) 


ay 2z(Bo2%—! +... + By) 


so that 


(3.12.8 yr = Bo 
-12.5) | f(z) .  SSawe* 

ao G 
The dispersion of the prediction error is R(O) if G’ = 0 and otherwise is 278/a9. 
The prediction formula for x(n) in terms of the variables --- , x(n — v — 1), 
x(n — v) has now been derived for v = 1, for the chance variables under discus- 
sion in this section. The solution for general »v is easily obtained. 

As v — ©, the prediction converges with probability 1, according to the 
corollary to Theorem 1.2. If the process is a component process of an N- 
dimensional t.h.G.M. process, and if G = 0 in (3.9.5), the limit is 0. That is, 
in this case, the best predicted value of x(n) in terms of the distant past is near 
E{x(n)} = 0, the same predicted value which would be used with no knowledge 
of the past. 





GAUSSIAN PROCESSES 


4. Processes whose parameter / varies continuously 


The basic process in terms of which t.h.G.M. processes without deterministic 
factors were expressed in section 3 was a process whose variables {(n)} were 
Gaussian, with 


(4.1.1) Eté(n)} = QO, E\é(m) -&(n)} = bmnl. 


The corresponding process in the continuous parameter case is not obtained by 
replacing the integral parameters m, n in (4.1.1) by continuous parameters. In 
fact the process so defined does not satisfy any useful continuity conditions. 
In the present discussion, sums like Pm A,,é(m) will be replaced by Stieltjes 


integrals / A(t)dé(t), and dé(t) thus will correspond to &(n). The &(#) process 
is defined as follows. For any ; < --- < ¢t,, the chance variables 


E(t) = E(t), ae E(tn) — E(tn-1) 


are mutually independent N-dimensional Gaussian chance variables, and if 
a <4, 

(4.1.2) Eté(t) — &s)} = 0, Etle® — &(s)J-EM — &s)]} = CE — 8). 
This process, called simply a &process below has been discussed in great detail 
by Bachelier, Wiener and Lévy. The function £(¢), considered as a function of ¢ 
is known to be continuous with probability 1.'' The derivative ¢’(¢) does not 
exist, since E{[é;(t + h) — &,(t)]’} is proportional to h, whereas this mean would 
be proportional to h” if £’(t) existed. In fact it has been shown that £(t) is 
(with probability 1) not even of bounded variation in any finite interval. How- 
ever, if f(t) is a function defined and continuous for a < ¢ S b (where a or b or 
both may be infinite), the integral 


(4.13) l f(t) k(t) 


can be defined as the limit in the mean of the usual Stieltjes sum. If f(¢) hasa 
continuous derivative, the integral in (4.1.3) can be evaluated by integration 
by parts: 


ais) ff ae) = sO) — He@e@ — | EOF ae 


Integrals of the following type will be used below: 


ud) = [$d — 2) a(x) = FOE] — St — Ca) 
(4.1.5) : 


+ [ e(r)f'(t — 7) dr 


7 Paley and Wiener, ‘Fourier transforms in the complex domain,’’? Am. Math. Soc. 
Collog. Pub., Vol. 19, p. 148. 





264 J. L. DOOB 


where f(¢) is continuous and has two continuous derivatives. It is then evident 
that y(¢) is continuous, but that y’(t) exists if and only if f(0) = 0. If f() =0, 
y'(t) is given by 


(4.1.6) yo = [ fe-D a, 


A more general process will also come into the discussion below, and will be 
called a ¢-process. The chance variables {¢(¢)} of a ¢-process are Gaussian, and 
have the same independence property as the variables of a &process. The 
second equation of (4.1.2) is dropped, so that (4.1.2) is replaced by 


(4.1.7) E{s@ —¢0)} =0, E{e@® — sO)-@ — sO)]} = DO, 


where the symmetric and non-negative definite matrix D(t) will sometimes be 
supposed to have special properties, such as continuity in ¢, etc. The inde- 
pendence property of the ¢-process implies that 

(4.1.8) Et{s@) — s(s))-F@Q — o@)}} = D® — Ds). 

Hence D’(t) (if this derivative exists) is symmetric and non-negative definite, 

THEOREM 4.1. Jf the dispersion matrix D(t) of a §-process is continuous, the 
functions {¢(t)} are continuous in t, with probability 1. 

The component processes of a ¢-process with a continuous dispersion function 
are also ¢-processes with continuous dispersion functions. Hence it will be 
sufficient to prove the theorem in the one-dimensional case. In this case D(t) 
is non-negative and monotone non-decreasing, according to (4.1.7) and (4.1.8). 
It can be supposed that D(t) does not vanish identically. Let D,(¢) be an inverse 
function of D(t): D[D,(t)] = t. Then &(¢) = ¢[D,(t)] defines a &process, and the 
continuity of £(t) implies that of ¢(é). 

The integrals of type (4.1.3) are defined for {-processes as for processes, and 
satisfy the equations 


B{ [ s@ ato} =0 


ai — BAf soaro-f oo aro} = [ roan" a 


ef Aw ac) [ BE aro} = [ ApoB at, 


where f, g are numerically valued functions and A, B are matrix functions.’ 

The ¢-processes lie at the basis of t.h.G. processes. To every t.h.G. process 
(discrete parameter) with variables {x(n)} correspond two one-dimensional 
¢-processes with variables {f:(t)}, {f2(¢)} such that 


(4.1.10) a(n) = [ cos nd df1(A) + sin nr df2(A) 


18 These equations are easily proved using the fact that each integral can be approxi- 
mated by the usual Riemann-Stieltjes sums. 














GAUSSIAN PROCESSES 265 




























nt where the two ¢-processes are mutually independent in the sense that every 
0, ¢,(&i) is independent of every {2(A2) and where, if G(A) is the complex spectral 
function of the process, 
(4.1.11) E{g.(a)"} = GQ). 
In the continuous parameter case (4.1.11) becomes 
_ (4.1.12) x(t) = : cos td d{i(A) + sin td df2(d). ” 
ms This theorem of Cramér shows that x(n), or x(t) as the case may be, is the limit 
of a sum of sines and cosines, with Gaussian chance variables as coefficients. 
The dispersion of each coefficient, which measures the intensity of the cor- 
be responding periodic term of the sum, is determined by the spectral function of 
\de- the process. In particular, if the spectral function F(A) is the integral of its 
derivative F’(\), each integrand involving df;(A) in the above equations can be 
replaced by one involving ~/F’(\)dé:(A) where £;(A) is the variable of a &process. 
Thus in many important cases the processes can be written in a simple way in 
ite. terms of £processes. 
, the It will be shown below that every t.h.G.M. process can be represented as the 
direct product of factors of certain types. The deterministic types have already 
tion been catalogued: M(0), M(1), M(e’’). The standard non-deterministic type, 
| be as in the discrete parameter case, will be called type M. 
D(t) M. Let {&(t)} be the chance variables of an N-dimensional £-process, as de- 
1.8). scribed above. Let Q be an N-dimensional square matrix, and let S be an N- 
7erse dimensional symmetric non-negative definite matrix. Define x(t) by 
1 the © t 
(4.2.1) a(t) = [ ev’ Sdt(t — s) = [ e—P2 de(r) 
and ’ “ 
where it is supposed that the improper integrals converge with probability 1. 
There will be convergence, for example, if Q has only characteristic values with 
negative real parts so that the elements in the matrix e*® go to 0 exponentially 
ass—> «©, (Cf. section 1.) It will be shown below that it is no restriction to 
assume that @ has this character. The x(t) process is evidently a t.h.G. process. 
If u < t, the chance variable 
t 
(4.2.2) a(t) — e*”® x(u) = e'@ / e ** S dé(s) 
ons." is independent of x(v) for v < u, since x(v) is expressed in terms of ¢(s) for s < v. 
roces — Therefore the x(t) process is a Markoff process with transition matrix A(t) = 
sional ef. 
(4.2.3) E{x(v),v < u;2(t)} = e™®x(u), “u<t. 
*H. Cramér, Arkiv For Matematik, Astronomi och Fysik, Vol. 28B, No. 12, pp. 1-17. 
Cramér only discusses the continuous parameter case, but the other requires ro change of 
yproxi- method. He allows complex-valued processes, in terms of which (4.1.10) and (4.1.12) 


assume a more elegant form. 


266 J. L. DOOB 


A process defined in this way will be called a process of type M. A change of 
variable y(t) = Bzx(t) leads to a process of the same type: 


(4.2.4) y(t) = " e808" BS dt(t — 3). 


The matrix Q goes into BQB™ and if S,O is the polar form of BS, where S jg 
symmetric and non-negative definite and O is orthogonal, S goes into Si. (We 
are using the fact that O&(t) defines a second é-process.) The correlation func- 
tion of a process of type M is easily calculated: 


(4.2.5) R(O) = I : ef? Se” ds, R(t) = RO)e'*’. 


The only condition imposed on Q is that the improper integrals in (4.2.1) con- 
verge. This condition is easily seen to be equivalent to the convergence of the 
integral in (4.2.5). This in turn is equivalent to the condition that 
(4.2.6) lim eS = 0. 
s—o 

This condition is certainly satisfied if the characteristic values of Q all have 
negative real parts, and it can always be assumed that this is so. (Cf. the cor- 
responding discussion of processes of type M in the discrete parameter case.) 

The analogues in the continuous parameter case of Theorems 3.4 and 3.5 are 
true. The proofs are substantially the same as the proofs in the discrete param- 
eter case, and will be omitted. 

THEOREM 4.3. (i) Every t.h.G.M. process (continuous parameter) is the direct 
product of processes of type M(O), M(1), M (e*), M. 

(ii) If x(t) are the variables of such a process, there is a matrix Q such that A(t) = 
e'° is a transition matrix function. There is a &process, a Gaussian variable §, 
independent of the &(t), satisfying 


(4.3.1) E{é} = 0, Ei{é-é} = I 
and symmetric non-negative definite matrices S, T such that 


eo I eS de(t — s) + ef? TE 
(4.3.2) 


t t 
= [ ees des) + Te = [ ces dx(s) + e200), 


oo 


(4.3.3) QT + TQ =0, 
(4.3.4) R(0) = I ef? Se ds 4 7? 


(4.3.5) QR(0) + RO)Q* = —S’, 


where the integrals in (4.3.2) converge with probability 1. The integral and tht 
last term in each pair in (4.3.2) are linear transformations of x(t): (4.3.2) exhibils 





GAUSSIAN PROCESSES 267 


in part the decomposition into factor processes described in (i). The correlation 
function as given by 


R(t) = RO)e’” 


4.3.6) t= 0. 
( R(—t) = eR(O) 


(iii) The matrix Q 1s uniquely determined if and only if the process is non- 
degenerate. In any case there is a Q whose characteristic values all have negative 
or zero real parts and whose characteristic values with zero real parts correspond to 


simple elementary divisors. The matrix Q furnishes the solution to the prediction 
problem of the process: 


(4.3.7) E{x(s),s S t;2(t + u)} = e** x(t), u> 0. 


The matrix S, which is uniquely determined, measures the dispersion of x(t) about 
its predicted value: 


(4.3.8) Es(a(t + wu) — e“@x(t)P} = RO) — e“?RO)e"?” ~ uS? (u— 0). 


(iv) Conversely if Q is a matrix with at least one characteristic value with negative 
real part or with zero real part and corresponding to a simple elementary divisor, 
e* is the transition matrix function of a t.h.G.M. process with R(O) not the null 
matrix. If all the characteristic values of Q are as just described, e’® is the transi- 
tion matrix function of a non-degenerate t.h.G.M. process. If R(O), S, Q are 
matrices satisfying (4.3.5) with R(O), S symmetric and non-negative definite, there 
is a t.h.G.M. process whose variables can be written in the form (4.3.2) with the 
gen R(O), S, Q. 

The proof of Theorem 4.3 follows closely that of Theorem 3.6, and the details 
will not be given, except as they differ from those of the earlier proof. 

Proof of (7). Suppose that the {2(t)} are the variables of a t.h.G.M. process 
which is non-degenerate. The transition matrix function A(¢) is then uniquely 
determined by (1.5.5). Take the conditional expectation of both sides of 
(1.5.4) for given x(0): 


(4.3.9) A(s + t)x(0) A(t)A(s)x(0) s,t > 0. 
Since the process is non-degenerate, 


(4.3.10) A(s + t) A(s)A(t) s,t > 0. 
According to (1.3.1) and (1.5.5) 


(4.3.11) lim R(t) = lim R(O)A()* = RO), > 0. 
t—0 t-—0 


Hence 


(43.12) lim A(t) = 1. 


t-0 





268 J. L. DOOB 


It has already been noted that any solution to (4.3.10) under the continuity 
hypothesis (4.3.12) can be written in the form 


(4.3.13) A(t) = e®, 
where 
Q = lim A(t) — 1 ; 
‘0 t 


Under a change of variables y(t) = Bz(t), A(t) becomes BA (t)B™ and Q becomes 
BQB'. According to Theorem 2.1, if the x(¢) process is degenerate, it is the 
direct product of one or more factors of type M(0) and (perhaps) of a non- 
degenerate factor. The matrix Q of a factor of type M(0) can be taken as the 
null matrix. Then the form (4.3.13) is admissible for any t.h.G.M. process, 
although Q will only be uniquely determined if the process is non-degenerate, 
Define ¢(t) by 


(4.3.14) c(t) = A(t) a(t) = e x(t). 

Then ifs <4 <& <b 

(4.3.15) E{g(h) — ¢(s)} = 9, U{[S(t) — £(s2)]-[8(4) — o(s)]} = 0 
and 

(4.3.16) D(t) = Elg(t) — £(0)]-[¢@® — ¢O)]} =e “°RO)e “* — RO). 


Hence the {¢(¢)} determine a ¢-process, with dispersion matrix given by (4.3.16). 
The derivative D’(t) is easily evaluated: 


(4.3.17) Di(t) = e *[—RO)Q* — QRO)Ie ‘*. 


Since D’(t) is symmetric and non-negative definite, the bracket also has this 
property, and there is a non-singular matrix S, such that 


(4.3.18) Si{—-QRO) — RO)Q*|Si = U, 
where U is in diagonal form, with only 0’s and 1’s in the main diagonal. Then 


the integral 


(4.3.19) I * Sie'® dt(s) 


defines a ¢-process with dispersion matrix tU. There is therefore a &process 
with variables {&(t)} such that 


(4.3.20) Ui(t) = [ Sie"? d¢(s). 


This equation can be solved for ¢(t) and x(t): 


(4.3.21) x(t) = e'@¢(t) = ef [ e °° SoU dt(s) + e** x(0) 





GAUSSIAN PROCESSES 269 


where Ss = S;'. The matrix S.U can be written in the polar form SO where S 
is symmetric and non-negative definite and 0 is orthogonal. This S is the S 
of (4.3.2) ete. 

The remainder of the proof follows closely the proof of Theorem 3.6 and will 
be omitted. 

An important class of t.h.G.M. processes which arises frequently in physical 
applications is obtained in the following way. Let {&(¢)} be the variables of a 
one-dimensional ¢process. Consider the formal equation 


d™ y(t dy y(t 
(4.4.1) a oo _-_ = 20. ay y(t) = cé’(t), 


where @;, --- , @y,careconstants. This equation cannot be considered precise 
as it stands, since é’(t) does not exist. ‘The problem can however be reformulated 
as follows: find a y(t) process, where y’, --- , y“” are supposed to exist, satis- 
fying the equation 


i] f(t) dy*~” (t) ae a | f(t) dy (t) aL 
(44.2) © ; 


— ax f soye da =e f 4 acto 


with probability 1, for each continuous function f(t) and each pair of numbers 
a,b. The formal integrals are defined as the limit in the mean of the usual 
sums.” The integral on the right has already been discussed. With this 
interpretation, equations involving ~’ can be treated in the usual way, and this 
will be done in the following without further comment. The formal solution 
of (4.4.2) is well known. Let A,, --- , Aw be the roots of the equation 


(4.4.3) WY — ar” —.--- — ay = 0 


and suppose that these roots are distinct, and have negative real parts. Let 
A; be the cofactor of \;* in the determinant 


1 ‘ — 1 
(44.4) ene ae 
_ i - : Me | 


Then the general solution of (4.4.1), that is to say of (4.4.2), is 


t N N 
(445) y(t) =£ [ Y Anse! des) +2 anjety* PO). 
6 Jo j=1 6 jk 


Since the integrand and its first N — 1 derivatives vanish when s = ¢, y’, --- , 
y” as defined by (4.4.5) exist, but y(t) does not exist, because ¢’(é) in (4.4.1) 


*” For a full discussion in the case N = 1 ef. Doob, Annals of Math., Vol. 43 (1942), pp. 
358-61. 





270 J. L. DOOB 


does not exist. The y(t) process is a t.h.G. process if y(0), --- , y” (0) are 
chosen properly. This can be seen from the solution 


(4.4.6) yi) =§ 7 tes eit gels), 


«© j=l 


In fact this is the only solution defining a t.h.G. process. To prove this, rewrite 
(4.4.5) in the form 


N t N 
(44.5) yl) = $20 dws fo OM de) + 5 De Aue YPC), 
j=l t—r 6 jR1 

If the y(t) process is a t.h.G. process, (4.4.5’) becomes (4.4.6) when tr > — a, 
Thus there is a unique stationary solution to (4.4.1) and, by (4.4.5), every solu- 
tion tends to this solution in the long run. The stationary solution (4.4.6) 
has the property that y(¢) is written in terms of £(s) for s < ¢. Then in (4.4.5) 
the integral is independent of the terms involving the initial conditions. In 
other words 


(4.4.7) E{y(s), s < 0; y(t)} =; oo Anjerit y* (0). 


Hence the variables y(t), y’(t), --- , y ”() define an N-dimensional t.h.G.M. 
process. The transition matrix function A(t), and the matrices Q, S, T of 
Theorem 4.3 are easily calculated. 


A(t): (Agj di e***), 


0 1 0 0 
0 0 1 


3 ‘ 
. . « 


T = 0. 


The necessary changes to be made if the \; are not distinct are well known. 
The case c = O will be treated below, when the problem will be reconsidered 
from another point of view. In all cases the solution of (4.4.1) leads to an 
N-dimensional t.h.G.M. process.” 

As a simple example, consider a torsion pendulum, suspended in a sealed 
container. The only turning forces acting on the pendulum are the molecular 
shocks of the surrounding gas, and the restoring torque. The equation of mo- 
tion is 


(4.4.9) el ia My + oy(t) = X(0), 


21 According to a letter from Uhlenbeck, the differential equation (4.4.1) was solved; 
from a somewhat different point of view, by Miss Ming Chen Wang, in a thesis written in 
1941 which is unfortunately inaccessible to me at the moment. 


-_ 


rr Oe eee i ee Oe 


EE ——— OO lS 


ae a 





GAUSSIAN PROCESSES 271 


where y is the angular displacement measured from the equilibrium position, 
] is the moment of inertia, a: is the torque coefficient of the suspension, and the 
molecular force is resolved into a systematic Stokes term ay’ and a remainder X. 
The remainder term X(t) defines a stationary process which to a first approxima- 
tion is “purely random.” In the present context “purely random” means that if 
th <-:: < t,, X(t), --- , X(@é) are mutually independent. This is precisely 
the property the derived process of a process would have, if é’(t) existed. 
Unfortunately it has already been noted that é’(¢) does not exist, since the dif- 
ference quotient [é(¢ + h) — é(£)]/h is unbounded as h — 0. It has already 
been seen, however, that (4.4.9) can be given a meaning with X(t) identified 
with cé’(¢) even though é’(t) does not exist, and it has been seen that the solution 
approaches a steady state. It may still be a disappointment to some that the 
solution y(t) has a first. derivative y’(¢) but that y’’(t) does not exist: there is an 
angular velocity but not an angular acceleration! This unhappy circumstance 
can either be blamed on the physical world, or on the mathematical approxima- 
tion to the physical world, depending on the point of view. The corresponding 
electrical picture is the following. There are spontaneous currents in any 
electrical circuit, due to the thermal motion of the electrons. This is known as 
the Johnson effect. In a simple closed circuit, consisting of an inductance L, 
a resistance, R, and a capacitance C in series, the current equation can be written 
in the form 


d'y(t) dy(t) , y(t) _ 
(4.4.10) La +B +7 = £0, 


dt? 


where y is the charge on the condenser and X(t) represents a fictitious voltage 
set up by the motion of the electrons. The X(¢) is identified with cé’(#) as 


; s d ; 
before. In this case there is a current = , but the current function has no 


derivative. In these applications, the physical justification for the Gaussian 
character of the &distribution lies in the Gaussian character of the Maxwell 
distribution of elementary particle velocities. The known mean particle kinetic 
energy determines the constant c in (4.4.1). The more complicated mechanical 
or electrical systems will lead to equations of higher order than 2, or systems of 
equations. For example the usual current equations of a net or resistances 
capacitances and inductances lead to a system of say v second order equations 
of type (4.4.10), and the corresponding pairs y, y’ form a 2 v-dimensional t.h.G.M. 
process.” 

The processes defined by linear differential equations of the type (4.4.1) are 
the analogues of the t.h.G.My. processes in the discrete parameter case. Instead 
of defining these solutions of (4.4.1) as the t.h.G.My. processes, however, we 
shall use a definition closer to the definition in the discrete parameter case. A 


* Further discussion and references to papers by physicists on this subject will be found 
in Doob, Annals of Math., Vol. 43 (1942), pp. 351-69. 





272 J. L. DOOB 

one-dimensional t.h.G. process with variables {y(t)} will be called a t.h.G.My. 
process if the derivatives y’(t), --- , y”(¢) exist, and if whenever s < t, 
(4.5.1) Efy(r), 7 S 8; y@)} = Efy(s), y/(s),---, y™ (8); yO}. 


If N = 1, the process is a t.h.G.M. process. The right hand side of (4.5.1) 
is a linear combination of the variables y(s), --- , y°’”(s). The variables 
{y(t)} thus satisfy an equation of the form 


(4.5.2) y(t) — a(t — s)y(s) — --- — an(t — s)y*(s) = n(s, 2) 


where n(s, ¢) is independent of the variables {y(7)} for 7 < s. Define the vari. 
ables {x(t)} of an N-dimensional t.h.G. process by 


a(t) = y(d) 
z(t) = yo, 


If this process is degenerate, there is a relation of the form 


(4.5.3) 


N—1 


(4.5.4) coy(s) + cy’ (s) + e+e + ev_iy” *(s) ro 0, x | ¢; | > 0. 


It can be assumed that cy_, # 0, (differentiating (4.5.4) to get a term in y(t) 
if there is none originally). Then y°”~”(s) can be eliminated in (4.5.2), to get 
a relation of the same type with N replaced by N — 1. Hence the process is 
non-degenerate if N is the minimum index for which (4.5.1) is true. It will now 
be proved that the x(t) process is a t.h.G.M. process. It can be assumed to be 
non-degenerate. Using (4.5.1), 


(4.5.5) E{2z(r), 7 S s;u(t)} = Ely(r), 7 S 8; y(t)} = Efx(s); n(6)}. 
It must also be shown that 
(4.5.6) E{x(r), 7 S 3; 2;(t)} = E{x(s); x;(t)} J oo ol 


This will be shown by justifying the taking of derivatives in (4.5.5). It will be 
sufficient to prove (4.5.6) when j = 2. Using (4.5.1), 


(4.5.7) E{2(), r<s; ne +H — wih) - B{2(8); seth — vO}, 


The right hand side is a linear combination of 2;(s), --- , 2w(s) whose coefficients 
are continuous in h, h 2 0, since the correlation function of the y(t) process is 
continuous. Hence the right hand side converges to 


E{x(s); yf} = Ef{x(s); x(t)} 


when h — 0. Since the difference 


xo — B{a(0); 4+» = veoh 





GAUSSIAN PROCESSES 273 


is uncorrelated with x(7) if 7 < s, the same is true of its limit ash 0. This 
means that (4.5.6) is true when j = 2, as was to be shown. Conversely if {y(t)} 
are the variables of a one-dimensional t.h.G. process, if y’(é), --- , y (2) 
exist, and if the x(t) process defined by (4.5.3) is a t.h.G.M. process, the y(t) 
process is obviously a t.h.G.My. process. The transition matrix function A(t) 
and the matrices Q, S, T of Theorem 4.3 are easily calculated. Suppose that 
the x(t) process is non-degenerate. Since y(t) is given by 


y* (t) sa a(t) _ [ 2» fe? SI. dé;(s) + 2 fe? T iz &; 


(4.5.8) t N N 
“ I De Shes ds) + 2 (C')ss2H0) 


and since 2;(t) exists if i < N, it follows that the integrand must vanish when 


= t: 
(4.5.9) (S):; = 0, ~=1,---,N-1, j=1,-::,N. 
Since S is symmetric and non-negative definite, S must have the form 


0 
(4.5.10) 


The fact that 2;(t) = Xisi(t) means that 

(4.5.11) Y (Q)a20) = Lenya, Geyer yN 
or, since the x(t) process is non-degenerate, 

(4.5.12) (°° Q)is = (Caray ; es 


= 1,--- ,N. 
Hence (t — 0) Q has the form 


(4.5.13) 


Conversely if there is an N-dimensional non-degenerate t.h.G.M. process with 
transition matrix function e’® where Q is given by (4.5.13) and dispersion matrix 
S given by (4.5.10), 


x; (t) = ri+i(t), mh" ’ "= 1, 


and the 2,(¢) process is a t.h.G.My. process. 














274 : J. L. DOOB 


Case 1. S = O (deterministic case). In this case the x(t) process is deter- 
ministic: 
(4.5.14) a(t) = e'°T¢. 


Since Q satisfies its characteristic equation 

(4.5.15) a” — ama" — .-- —ay = 0 

it follows that 

(4.5.16) tt) — aga” (tt) — --- — ayr(t) = 0, 
(4.5.17) y(t) — ay”) — --- — avy(t) = 0. 


The roots of (4.5.15) are simple roots, and are all pure imaginary, according to 
Theorem 4.3. It follows that 


(4.5.18) y(t) = > (n; cos t0; + ¢; sin 46;) 
7 


where the 7; and ¢; are one-dimensional Gaussian variables, and {76;} are the 
distinct roots of (4.5.15). 

Case 2. S #0 (non-deterministic case). In this case it will now be shown 
that the x(t) process has no deterministic factor, that is that the roots of (4.5.15) 
all have negative real parts. In fact let 8 be a root of (4.5.15), corresponding 
to the characteristic vector z of Q*: 


(4.5.19) z= (ayB", avB* ~ + ayiB"",---, aw + avaB +--+ + a6") 
oie 
Then using (4.3.5) 
0<c¢|B\>* = (Sz, z) = —(RO)Q*z, z) — (QR(O)z, z) 
(4.5.20) = — \(R(0)z, z) — A(R(O)z, z) 
= —( + A)(R(O)z, 2). 


Hence \ + X is real and negative: \ has a negative real part. In this non- 
deterministic case, therefore, the x(¢) process can have no deterministic factor. 
The matrix T is the null matrix, and (4.3.2) becomes 


’ t 
(4.5.21) 0 i= [ e985 de(s) 
which leads to 


(4.5.22) y(t) = c [ [eS ? hw déw(s). 





GAUSSIAN PROCESSES 
Moreover 


y') = [. [Qe hw déw(s) 
(4.5.23) 
y(t) ‘ e | er... déy(s). 


Since Q satisfies its characteristic equation (4.5.15), 


(4.5.24) [. [Qre lw déw(s) — z a; [ [Qye "hy déw(s) = 0. 


In other words 
t 
(4.5.25)  ¢ [ [(Qre hwy déw(s) — ay? (t) — --- — avy(t) = 0. 


Now formally, if y(t) existed, the last equation in (4.5.23) could be differentiated 
to give 


é 

(4.5.26) y(t) = dQ” hvéw() + ¢ [ [Qe }iw dén(s) 
and (4.5.25) would become 
(4.5.27) y(t) — ay P(t) — +++ — any(t) = cén(t). 
(We are using the fact that (Q* *):1y = 1.) Thus the t.h.G.My. processes satisfy 
the formal differential equation (4.5.27) already discussed above from another 
point of view. Equation (4.4.2) is readily justified. 

THEOREM 4.6. (i) Let {x(t)} be the variables determining a t.h.G.M. process. 
Then considered as functions of t, the x(t) are continuous with probability 1. Let 


{y(t)} = {2,(t)} be the variables of a coordinate process. 
(ii) If y’(t) exists, it ts a linear combination of coordinate functions: 


y(t) = 2 c;2;(t). 


(iii) If y'(t), --- , y\* ” (t) exist, y(t) satisfies a generalized differential equation 
(4.4.1), that is the y(t) process is a t.h.G.My. process. 

(iv) If y’(), ---, y ® exist, y(t) has derivatives of all orders. The y(t) process 
isat.h.G.My. process (deterministic case) and y(t) therefore satisfies an Nth order 
homogeneous differential equation (4.5.17). 

(v) If x(t), --- , an(t) exist, that is if x’ (t) exists, the x(t) process 1s deterministic 
and the coordinate functions have derivatives of all orders. 

Proof of (7). It has already been shown that the {¢(¢)} determined by (4.3.14) 
determine a ¢-process, and the dispersion matrix function D(t) of the ¢-process, 
given by (4.3.16), is certainly continuous. Hence, by Theorem 4.1, the {¢(é)}, 
and therefore the {2z(¢)} are continuous in ¢, with probability 1. 





276 J. L. DOOB 


Proof of (i). If z,(t) exists, the rth row of S in (4.3.2) must vanish, and 2,(t) 
is given by the rth coordinate of 


(4.6.1) [ Qe‘? S dé(s) + Qe’? x(0) = Qzx(t). 
Hence 
(4.6.2) x) = Le Qrsa(0. 


Proof of (iii). Suppose that «,(t), ---, 2$”?(#) exist. Then (r is fixed in 
the following equations) 


6 NH N 
z(t) = [ a fe“? S],; dé;(s) + d, [e’?],; 2;(0), 


z,(t) = I dX le“? QS],; dé;(s) + [e"* Ql-5;(0), 


a(t) ‘i I > ise or Sh,5 dé;(s) a > fe’? Q* | x,;(0) 


and (in order that the derivatives can exist) 


(4.6.4) 
(Q”* S)r5 = 
Since Q satisfies its characteristic equation, say (4.5.15), 


(4.6.5) [. [e“*-° Q” S] d&(s) — a f [Q* 7c} dé(s) = 0. 

This vector equation can be written (using only the rth coordinate) in the form 
(4.6.6) ive Sl; dé(s) — a,ai* (t) — --- — aya,(t) = 0. 

If ¢’(t) existed, the last equation in (4.6.3) could be differentiated to give 
467) 2 @ = [Lt sis abs(o) + D1" She 

and (4.6.7) would then become 

(4.6.8) x)? (t) — ari’ (t) — --- — aya,(t) = YQ" S580. 


Now the process with variables 


(4.6.9) ‘: ye Sts} 





GAUSSIAN PROCESSES 277 


is a &process, if c is chosen properly, unless the parenthesis in (4.6.9) vanishes 
for all 7. In either case (iii) is proved. 
Proof of (iv). If in (iii), x$%?(t) exists, (4.6.3) can be augmented to include 


t N N 
463) iO =f Dt" sh, aes) + D le Q"z4(0) 


o j= 


and (4.6.4) now includes 
(4.6.4’) (Q""'S),;=0, j=1,---,N. 


In this case the last term in (4.6.7) vanishes and (4.6.8), with zero on the right 
hand side, is strictly true. 
Proof of (v). If x(t), ---, ay(t) exist, S must vanish and (4.6.3) yields 


(4.6.10) a(t) = e*x(0), s(t) = Q’xlt). 


Thus the x(¢) process is deterministic and x(t) has derivatives of all orders. 
THEOREM 4.7. Let {x(t)} be the variables of a one-dimensional t.h.G. process. 


The process is a component process of an N-dimensional t.h.G.M. process if and 
only if the chance variables 


(4.7.1) «(0), {E{x(s), s < 0; x(t)}} O<t<e« 


are linearly dependent on N variables. 

Suppose that the x(f) process is a component process of an N-dimensional 
t.h.G.M. y(t) process: x(t) = y,(t), and let A(t) be the transition matrix function 
of the y(t) process. Then if « > 0 and if n is any integer, the difference 


yl(n + Ie] — Alejy(ne) 


is independent of every y(s) with s < ne, and therefore independent of every 
y(me) with m S n. Hence the y(ne) process is a t.h.G.M. process (discrete 
parameter case). Equation (3.8.5) becomes, in this case, if n = 0, 


(4.7.2) xf(N + ve — ay” 2l(N — 1) — --- — af? x0) = nf? (N 4 ) 


where i{”(N + y) is not merely independent of the variables --- , z(—e), x(0), 
but is even independent of every x(s) with s < 0. It then follows, applying the 
operator E{x(s), s S 0;-} to both sides of (4.7.2), that the variables in (4.7.1) 
are linearly dependent on N variables if ¢ is restricted to be a multiple of e. 
Allowing ¢« to run through the values 


1) waa. 
m! 


it follows that the statement of the theorem is true if ¢ is restricted to be rational. 
The proof will be complete when it is shown that the subject” variables for ra- 
tional ¢ are dense in the whole class in the sense that for any ¢, the expectation 
(4.7.3) 6 = E{[E{zx(s), s < 0; x(t’)} — E{x(s), s < 0; x(t)}P} 


* Courtesy of U. S. Navy. 





278 J. L. DOOB 


converges to 0 when ?t’ — ¢. In fact, using the Schwarz inequality 


5 = E{[E{x(s), s < 0; x(t’) — 2(t)}P} 


(4.7.4) 


E{E{x(s),s < 0; [x(’) — e®P}} = Etle@) — zP} 


and the basic continuity hypothesis (1.3.1) imposed on continuous processes js 
precisely that the last expectation converges to 0 when ?¢’ — ¢. 

Conversely suppose that the chance variables (4.7.1) are linearly dependent 
on N variables. It can be supposed that x(0) is one of these N. Let the others 
be those for which t = &,--- , ty , and define y;(t), --- , yw(t) by 


yi(t) = x(t) 
y(t) = E{x(s), s S t; x(t + ¢)) j= 2,-->,N. 


The y(t) process is obviously an N-dimensional t.h.G. process. Moreover 


(4.7.5) 


E{y(s),s < 0; y,(t) = E{x(s), s < 0; y;(t)} 
= E{xz(s), s < 0; x(t + ¢,;)} j=1,---,N 


(4.7.6) 


(where é, is defined as 0). Since the right side is by hypothesis, for each j, a 
linear combination of (0), --- , yv(0), the y(t) process is a t.h.G.M. process, 
and the x(t) process is a component process, as was to be shown. 

A detailed examination will now be made of t.h.G.My. processes, and of the 
more general class of component processes of t.h.G.M. processes. The following 
theorem will be useful. 

THEOREM 4.8. Let {x(t)} be the variables determining at.h.G. continuous param- 
eter process. The process is a component process of an N-dimensional t.h.G.M. 
process if and only 2f for each « > O the discrete parameter process with variables 
{x(ne)} is a component process of an N-dimensional t.h.G.M. process. 

If the x(t) process is a component process of an N-dimensional t.h.G.M. y(t) 
process, the x(ne) process is a component process of the N-dimensional t.h.G.M. 
y(ne) process. Conversely suppose that the x(ne) process is a component process 
of an N-dimensional t.h.G.M. process (which may depend on e) for every ¢€ > 0. 
It follows that for each « > 0 the chance variables 


(4.8.1) E{ ---, a(—e), 2(0); x(n € )}, sn = 0, 1,-- 


are linearly dependent on N of their number. Hence the same is true of the 
following chance variables, if v, m are fixed and vy > m: 


(4.8.2) E{ -+- , a(—1/v!), x0); x(n/m!} n=0,1,---. 


According to the Corollary to Theorem 1.2, when v — ~ the conditional expecta- 
tions in (4.8.2) converge to 


(4.8.3) E{x(s), s S 0, s rational; x(n/m)} 





GAUSSIAN PROCESSES 


Hence the chance variables (¢ rational) 


(48.4) E{x(s),s < 0, s rational; x(t)} = E{x(s), s < 0; a(t)}>*, 
0<t< a 


? 


are linearly dependent on N of their number. As in the proof of Theorem 4.7 
it follows that the same is true if ¢ runs through all positive real numbers, and 
according to Theorem 4.7, the x(t) process is therefore a component process of 
an N-dimensional t.h.G.M. process. 

THEOREM 4.9. Let {ax(t)} be the variables of a one-dimensional continuous 
parameter t.h.G. process. The process is a component process of a finite-dimen- 
sional t.h.G.M. process-if and only if the complex spectral function of the process 
is the sum of the integral of the square of the absolute value of a rational function of 
X and of a monotone non-decreasing function increasing only in a finite number of 
jumps.” Specifically: 

(i) The process is a componeni process of an N-dimensional t.h.G.M. process if 
and only if the complex spectral function has the form 


, “| Bo(tr)"~* + +++ + Bwal? ities 
aoa) G0) = [ayy adap ook agp + OO) 
where 

(a) GA) is a monotone non-decreasing function satisfying (1.3.3) and increasing 

only in jumps, at no more than N points. 

(b) the denominator of the integrand vanishes at every discontinuity of G(d), and 

the numerator vanishes at every zero of the denominator, to at least the same order; 

(c) the coefficients in the integrand are real, and the roots of the polynomials are 

all on the real axis or in the upper half plane. 

The integral vanishes identically if and only if the x(n) process is a component 
process of an N-dimensional deterministic process, and G(A) vanishes identically af 
and only if the variables |x(t)} vanish identically or the x(t) process is a component 
process of an N-dimensional t.h.G.M. process with no deterministic factor. 

(ii) The process is a t.h.G.My. process, in the deterministic case, if and only if 
the complex spectral function G(A) = Gn) is a function increasing only in jumps, 
at no more than N points; non-deterministic case if and only if the complex spectral 
function has the form 


P cd 
OW) = | ar tak 


*4 The equality (4.8.4) is proved as follows. Let ¢ be fixed, and let x be the chance vari- 
able on the left. Then z(t) — z has mean 0 and is uncorrelated with every z(s) with s $0 and 
rational. It follows at once from the continuity of hypothesis (1.3.1) that then z(t) — zis 
uncorrelated with every x(s) with s < 0: it follows that (4.8.4) is true. 

** It is easily seen that the first term of the two can also be described simply as the in- 
tegral of a rational function of A, which is non-negative for real \ and is integrable and an 
even function, like all complex spectral density functions. 





‘280 J. L. DOOB 


Proof of (i). Suppose that the x(f) process is a one-dimensional component 
process of an N-dimensional t.h.G.M. y(t) process, x(t) = y(t). It is no restric. 
tion to assume that the y(¢) process is non-singular. Then the correlation fune- 
tion of the y(t) process is given by 


R,(t) R,(0)e"* 20 
R,(t) — eo R, (0) 0, 


where Q is uniquely determined and 


G0) — G0) = E [ ae R,(t) at| 


(4.9.3) 


(4.9.4) Qr 11 
Je an ve ll Rwlng 
"" 2 © at - _— 
at the points of continuity of G(A). 
The correlation function R,(¢) has derivatives of all orders for ¢ > 0: 


RY” (t) = R,(0)Q*’e*” t>0 
= (-1)’QUe"R,0) t<0. 


Suppose first that the y(t) process has no deterministic factor, in other words 
that it is non-degenerate and of type M. Then the characteristic values of Q 
have negative real parts and R(t) — 0 exponentially when |t|—> ©. Hence 
G(A) has a continuous derivative G’(A): 


(4.9.5) 


(4.9.6) G'(a) = = [ e [R,(t)lu dt. 


Integrating by parts, 
itr 


G0) = = [ 5 ol dt 


_ R,(0+) — R,(0-) 4i f° et 
(4.9.7) ~ Den)? Qa Lo (td)? 


_ RYO+) — RO—-) _ RYO+) — RVO—-) _ 1 f* &? pm 


Qx(ir)? Q(B Qt dw (id)? vd, 


Ri (t) dt 


Since Q satisfies its characteristic equation 

(4.9.8) av — aa — --- — ay = 0, 

it follows that 

R(t) — a RY) — --- — avR,(t) = 0 t>0 


(4.9.9) 7 ¥ 
R(t) + aR — --- + (—-1)77R,() =0 t<0 





GAUSSIAN PROCESSES 
and therefore if U is the operator 5 
[u~ a an ae ee —ayU[U* + at" _ 
(4.9.10) N-1770 
+ (-1)" UJR,()=0, ¢t € 0. 
Applying (4.9.10) to (4.9.6) 
[G@x)™ — ay(ir)”* — +--+ — awli(@)” + an(ary** + --- 
(4.9.11) + (—1)*" ay] G’(A) 
= |(a)" — a,(a)"~ — --- — av ?G@’A) = P(A) 


where P(id\) is a polynomial of degree 2n — 2. Since P(id) is real and non- 
negative, when d is real, the roots on the real axis are of even multiplicity and 
those off the axis are symmetric in the axis. Moreover P(7,) is even, since the 
left side of (4.9.11) iseven. It follows easily that P(7d) can be written in the form 


| N—1 \2 
| 
| 


(4.9.12) P(ar) cane | d B(ir)* 7 


where the roots of the 8 polynomial are all on or to the left of theimaginary axis. 
Finally 


ry, _. | Bo(tr)** + +++ + Bya|? 
(4.9.13) G’(A) = - a. oar 
The denominator polynomial in \ vanishes only at points where 7 has a nega- 
tive real part, that is where \ has a positive imaginary part. This completes 
the proof in the case where the N-dimensional y(t) process has no deterministic 
factor. If there are such factors, it is easily verified that G(A) has corresponding 
discontinuities and the above proof then applies to G(A) less its jump function. 
The result can finally be summarized as in the statement of the theorem. If the 
y(t) process has only deterministic factors [R,(é)]u will be a sum of trigonometric 
functions and G(A) will be a function of jumps. 

Conversely suppose that the x(t) process has the complex spectral function 
(4.9.13). Then following the ideas of the proof of the analogous section of 
Theorem 3.9, it follows that R(t) satisfies the differential equation (cf. (3.9.10) 
and (3.9.11)). 


(4.9.14) R™ (t) — qaR*-Y) — --- — ayR(@t) = 0,t > 0. 


Any solution of (4.9.14) is a linear combination of (at most NV) functions 
(4.9.15) ef” te... 


where 8 is a root of the equation 
(4.9.16) ay — aa” ' — --- —ay = 0 


and where powers of ¢ may appear if 8 is a multiple root. Let ¢€ be a positive 
number. The discrete parametcr process determined by the variables {x(ne)} 





282 J. L. DOOB 


has correlation function R(ne). This function is a linear combination of func. 
tion 


(4.9.15’) (e*)”, n(e**)”, --- 

corresponding to those of (4.9.15). There is an equation 

(4.9.17) a” — a(e)a* — --- — ay(e.) = 0 

with the {e**} as roots, of the same multiplicity as that of 8 in (4.9.16). Hence 

(4.9.18) Ri(n + N)e] — a(e)R[(n + N — 1) — --- — av(e)R(me) = 0, 
n20. 


According to Theorem 3.10 the x(ne) discrete parameter process is therefore 
a component process of an N-dimensional discrete parameter t.h.G.M. process, 
Since this is true for all e, the x(¢) process is a component process of an N-dimen- 
sional continuous parameter t.h.G.M. process. 

If the integral vanishes identically, the non-deterministic factors in the N- 
dimensional process are irrelevant to the x(t) process and can be replaced by 
factors of type M(0). If on the other hand the spectral function is continuous, 
the deterministic factors are irrelevant and can be replaced by factors of 
type M. 

Proof of (iz). Since the t.h.G.My. processes are characterized among the 
component processes of N-dimensional t.h.G.M. processes by the fact that the 
first N — 1 derived process exist, their spectral functions (according to Theorem 
1.4) are characterized by the property that 


[ NP G(r) < © 


that is the numerator in (4.9.1) must be identically constant. If this constant 
is not 0, G(A) can have no jumps, since each jump corresponds to a zero of nu- 
merator and denominator. Hence G(A) is either identically G(A) or is in the 
form (4.9.2). The two possibilities obviously correspond to the deterministic 
and non-deterministic cases, respectively. 

CorotuaAry. The t.h.G.My. one dimensional process which is the solution of 
(4.4.1) has complex spectral function 


PN 2 
(4.9.19) [ = 


2 |) = a) = a 


In fact the complex spectral function has the form (4.9.2), where the coeffi- 
cients in the polynomial are those of the differential equation for the correlation 
function R,(t) in (4.9.9), that is the coefficients of the characteristic equation of 
the infinitesimal transition matrix Q, (cf. (4.4.8)). The evaluation (4.9.19) is 
also easily proved directly. 

The analogues of Theorems 3.10 and 3.11 in the continuous parameter case 
are easy to prove and will be omitted. 





ON CUMULATIVE SUMS OF RANDOM VARIABLES 


By ABRAHAM WALD 


Columbia University 


1. Introduction. Let {z;} (i = 1,2, --- , ad inf.) be a sequence of independent 
random variables each having the same distribution. Denote by Z; the sum 
of the first 7 elements of the sequence {Z;}, i.e., 


(1) Z=atat:-:- +2; (Qj = 1,2, --- , ad inf.). 


Let a be a given positive constant and b a given negative constant. Denote 
by n the smallest positive integer for which Z, lies outside the open interval (b, 
a),i.e.,Z, is either < bor >a. Obviously nis a random variable. Ifb < Z; < 
afor7 = 1,2, --- , ad inf., we shall say thatn = «. 

For any relation R we shall denote the probability that R holds by P(R). 
It will be shown later that P(n = ~) = 0, provided the variance of z; is positive. 

In this paper we shall deal with the problem of obtaining the value of P(Z, > 
a) and that of finding the probability distribution of n. 

The study of such cumulative sums is of interest in various statistical prob- 
lems. For example, a multiple sampling scheme proposed recently by Walter 
Bartky” makes use of such cumulative sums. 

Cumulative sums also play an important role in the theory of the random 
walk of interest in physics. The results obtained in this paper may have bear- 
ing particularly on the theory of the random walk with absorbing barriers. In 
the presence of an absorbing wall the random walk stops whenever the particle 
arrives at the wall, i.e., whenever the cumulative sum of the displacements 
reaches a certain value.’ 


2. Two Lemmas. Lemma l. /f the variance of z; is not zero, P(n = ~) = 0. 
Proor: Letc = |a|+]b|. Ifn = ~ then for any positive integer r the 
following inequalities must hold 


(k+1)r 2 
(2) ( 7 2) <¢ (k = 0, 1, 2, --- , ad inf.). 


t=kr+1 


To prove P(n = «) = 0, it is sufficient to show that the probability is zero that 
(2) holds for all integer values of k. Since the variance of z; is not zero, the ex- 
1Since P(n = ©) = 0, we have P(Z, S 6b) = 1 — P(Z, 2 a). 


?**Multiple sampling with constant probability’’, Annals of Math. Stat., Vol. 14 (1943), 
pp. 363-377. 


* See in this connection S. Chandrasekhar, ‘‘Stochastic problems in physics and astron- 
omy”, Rev. of Modern Physics, Vol. 15 (1943), p. 5. 


283 





284 ABRAHAM WALD 


7 2 
pected value of (= 2) converges to © asj—+«. Hence there exists a positive 
t=—1 


integer + such that 


(3) P (= z) < é| <4, 


From (8) it follows that the probability that (2) is fulfilled for all values of k js 
equal to zero. Hence P(n = «~) = 0 and Lemma 1 is proved. 

Lemma 2. Let z be a random variable such that the following four conditions are 
fulfilled: 

Condition I. Both the expected value Ez of z and the variance of z exist and are 
unequal to zero. 

Condition II. There exists a positive 6 such that P(e’ < 1 — 6) > Oand P(e > 
1+ 6)>0. 

Condition III. For any real value h the expected value Ee’* = g(h) exists. 

Condition IV. The first two derivatives of the function g(h) exist and may be 
obtained by differentiation under the integral sign, 7.e., 


, oe! d hz hz 
g(h) = = He Eze”, 


a 
dh? 


Then there exists one and only one real value ho * 0 such that 


h 
Ee’ = 1, 


g’(h) = Ee = Ez’ e™. 


Proor: For any positive h we have 
(4) gh) > P(e’ > 1+ 8)(1 + 8)". 
Hence, since P(e’ > 1 + 6) > 0, 
(5) lim g(h) = +2. 


hawee 

Similarly we see that for any negative h 

(6) g(h) > P(e’ < 1 — &)(1 — 8)". 
Hence, since P(e’ < 1 — 5) > 0, we have 

(7) tim g(h) = +. 

Since g’’(h) = Eze’ it follows easily from Condition II that 


(8) g’"(h) > 0, 


for all real values of h. 





CUMULATIVE SUMS 285 


The relations (5), (7) and (8) imply that there exists exactly one real value 
h* for which g(h) takes its minimum value. Since g’(0) = Ez is unequal to zero 
by Condition I, we see that h* ~ 0 and g(h*) < g(0) = 1. It is clear that the 
function g(h) is monotonically decreasing in the strict sense over the interval 
(— «, h*), and is monotonically increasing in the strict sense over the interval 
(h*, + 2). Since g(0) = 1 and g(h*) < 1, there exists exactly one real value 
ho ¥ 0 such that g(o) = 1. Hence Lemma 2 is proved. 


3. A fundamental identity. Denote by z a random variable whose distribu- 
tion is equal to the common distribution of z;(i = 1, 2, ---, ad inf.). Let D’ 
be the subset of the complex plane such that Ee = ¢(#) exists and is finite for 
any point ¢ in D’. Consider the following identity 


(9) Ee2"'* (2N—-2n)! = Ee?*' i [o(t)]”, 


where N denotes a positive integer. Let Py be the probability that n < N. 
For any random variable u denote by Ey(u) the conditional expected value of 
u under the restriction that n < N, and by Eyx(u) the conditional expected 
value of u under the restriction that n > N. Then identity (9) can be written as 


(10) Py E, yernt en -Zn)t + (1 = Py)Exe zZnt = [y(t)]”. 


Since in the subpopulation defined by any fixed n < N the expression Zy — 
Z, is independent of Z,, we have 


(11) Eye2t2x-20! _ Betty )%— 

From (10) and (11) we obtain the identity 

(12) PrExte™[p@)I""} + (1 - Px)Exe®™' = [o]". 
Dividing both sides by [y(t)]* we obtain 


(13) Py Evle™ {oI} + (1 — Py) Bre = 1. 


[y CaF 
Let D’” be the subset of the complex plane in which | ¢(¢) | > 1 and denote by 
D the common part of the subsets D’ and D’”. Since lim (1 — Py) = 
N=oo 


and since | Ex(e7*‘) | is a bounded function of N, we have in D 
* ZNnt 
(14) lim (1 — Py) “5 = 0 


Neo 0) 


Since 


lim PxrEw{e’™loOl"} = Ete™le) "3, 


we obtain from (13) and (14) the fundamental identity 


(15) E{e*™[e)) "} = 
for any point ¢ in the set D. 





286 ABRAHAM WALD 


4. Derivation of the probability that Z, > a. In what follows in this and the 
subsequent sections we shall always assume that the random variable z satisfies 
the conditions I-IV of Lemma 2, even if this is not stated explicitly. Since it 
follows from Condition III that the set D’ is the whole complex plane, we see 
that the identity (15) must hold for all points ¢ for which | g(¢) | > 1. 

Let ho ¥ 0 be the real value for which g(fo) = 1. Substituting fo for ¢ in (15) 
we obtain 


(16) Ee**" = 1, 


Let FE; be the conditional expected value of e”""® under the restriction that 
Z, => aand let Ey be the conditional expected value of e”""° under the restriction 
that Z, <b. Furthermore denote P(Z, > a) by a. Then it follows from (16) 


(17) a Ny _ (1 — a) Ky = 
Hence 
 _ 1— & 
(18) oh 
If ho > O then FE; > 1 and Ey < 1. Hence (18) implies the inequality 
1 1 
(19) oS a (ho > 0). 
“1 Oe 
If ho < O then EF; < land Ey > 1. Hence (18) implies the inequality 
1 1 
(20) l-e<zs (ho < 0). 


<5 SF 

We shall now derive lower and upper limits for Hy) and £,. We derive these 
limits under the assumption that hp > 0. To obtain a lower limit of Eo consider 
a real variable ¢ which is restricted to values > 1. For any random variable u 
and any relation R we shall denote by E(u | R) the conditional expected value 


of w under the restriction that R holds. Denote by P(¢) the probability that 
eheZn-1 < ge ~=Then we have 


(21) Eo = | jee em je" = : hare. 


Hence a lower bound of Ep is given by 


(22) Ey = e™ {eb cE (e je" < ‘yb 
s c 


where the symbol g.].b. stands for greatest lower bound with respect to ¢. Since 


bh 
e 


t 
is an upper bound of Ey, we obtain the limits 


(23) eho {eb cE (e |e" < )} < Bo < e” (ho > 0). 
t 





CUMULATIVE SUMS 287 


Let p be a real variable restricted to values > 0 and < 1. Denote by Q(p) 
the probability that e'4n-1 < pe? Then similarly to (21) we obtain 


(24) E, = [ {oct (e | eho > ty} dQ(p). 


Hence an upper bound of £; is given by 


(25) en" {lau pE (e |e"* > 9) : 
p p 


Since e“° is a lower bound of FE; , we obtain the following limits for FE, 


(26) <Bh<¢ {laub. pk (= | * > ty}, (ho > 0). 
p p 


In a similar way upper and lower limits can be derived for Ky and E; when ho < 
0. With the help of these limits upper and lower limits for a can be derived on 
the basis of equation (18). If ho > 0 then E, > 1, Ey < 1 and consequently 
the right hand side of (18) is a monotonically decreasing function of Ey and EF, . 
Hence if E; is a lower, and E£; is an upper bound of E,(i = 0, 1), then 


i- i- kz 
7) $s < =~ )CtC«Ce DP ho > 0). 
Ey e Es . E; — Ey 


In a similar way limits for a can be obtained when hy < 0. If both the absolute 
value of Ez and the variance of z are small, Ey and E, will be nearly equal to 
e and e™°, respectively. Hence, in this case a good approximation to a is 
given by the expression 


(27) 


(28) 


The difference & — a approaches zero if both the mean and standard deviation 
of z converge to zero. 


5. The characteristic function of n. Let Z, be a random variable defined as 
follows: Z, = aif Z, > aand Z, = bif Z, <b. Denote the difference Z, — Zp 
by «. Then ¢ is a random variable. 

In what follows we shall neglect « i.e., we shall substitute 0 for «. No error 
is committed by doing so in the special case when z can take only two values d 
and —d and the ratios a/d and b/d are integers, since in this case ¢ is exactly zero. 
Apart from this special case the variate ¢ will not be identical with the constant 
zero. However, the smaller the values | Ez | and Ez’, the smaller the error we 
commit by neglecting «. In fact, for arbitrary small positive numbers 6; and 62 
the inequality p(| «|< 6) > 1 — & will hold if | Ez|and Ez’ are sufficiently 
small. Thus in the limiting case when Ez and Ez’ approach zero the random 
variable « reduces to the constant zero. 





288 ABRAHAM WALD 


(a). The characteristic function of n when only one of the quantities a and b is finite, 
It will be sufficient to treat the case when a is finite andb = —<«. In this case 
nm is defined as the smallest positive integer for which Z, > a. To make the 
probability of the existence of such a value n to be equal to 1 we have to assume 
that the expected value yu of z is positive. Since b = — o, the fundamentg] 
identity (15) need not hold for all points ¢ of the set D. However, it follows 
easily from (13) that (15) holds for all points ¢ in D whose real part is non-negg- 
tive. Denote by ¥(7) the characteristic function of n (7 is a purely imaginary 
variable). Since Z, = a (neglecting ¢«), and 


Ele()|" = ¥[— log o(t)], 
identity (15) can be written as 
(29) e*y[— log o(t)] = 1. 
Let ¢(7) denote a root (with non-negative real part) of the equation in ¢ 
(30) log g(t) + 7 = 0, 
and substitute ¢(r) for ¢ in (29). Then we obtain 
(31) y(r) =e, 


As an illustration let us calculate ¥(7) in the case when z is normally dis- 
tributed. In this case 


log g(t) = wt + ‘: .; 


where » is the mean and o is the standard deviation of z. Hence 
= oa i 2 2 
(32) t(r) = —#5 A tt 
o 
If we take the + sign before the square root sign, the real part of ¢(7) is non- 


negative, since the real part of ~/ u2 — 20*r is greater than or equal tou. Hence 
the characteristic function of n is given by 


(33) W(r) = en ale*l—-wt-V/u2—20?F] (u > 0). 


(b). The characteristic function of n when a and b both are finite. Given the value 
of n, let p, be the conditional probability that Z, = a. Let p. denote the prob- 
ability that n is the smallest positive integer for which either Z, = a or Z, =} 
holds. Neglecting Z, — Z,, identity (15) can be written as 


(34) = pe + (1 — pelle p* = 


Let ¥:(7) be the characteristic function of n in the subpopulation where Z, = 4, 
and let y2(7) be the characteristic function of n in the subpopulation where Z, = 





CUMULATIVE SUMS 289 


>. Furthermore let ¥(r) be the characteristic function of n in the total popu- 
lation. 
Since we neglect the difference Z, — Z,, it follows from (18) that the prob- 
ability « that Z, = a is given by 
1-2" 

(35) a ti emho _ giho . 
Putting 1 — pr = qn the following relations hold 

2 . = 

Pn Palg(t)| * en 

n= nPalglt 

(36) vl —log y(t) | a = —— = i Pn Palo(t)) * 


—__ 2 
(37) Yel—log y()] = VMePaleOl” _ LX aepaloOl 


Qn Pn l—-ae 
vi—log o()] = De pale)” = Lion + anol pa 
= a [—log o(t)] + (1 — a)¥2[—log o(8)]. 
Putting —logy(t) = + we obtain from (34), (36) and (37) 
(39) ay(r)e** + (1 — a)yo(r)e"* = 1. 


According to Lemma 2-the equation —log y(t) = 0 has two different real roots 
int, = O and ¢ = ho, and ¢’(0) and ¢’(ho) both are unequal to zero. Hence, if 
y(t) is not singular at ¢ = 0 and ¢t = ho, the equation 


—loge(t) = 7, 


has two roots (:(7) and f(r) for sufficiently small values of 7 such that lim ¢,(7) 
t==0 


= 0 and lim &(r) = ho. Since the identity (15) holds for all values of ¢ for 


t=0 
which | y(t) | > 1, and since | g[t:(7)] | = | ¢[ée(r)] | = 1 for all imaginary values 
of r, it follows from (39) that both equations hold 


(39') anpr(r yer + (1 — a)ya(r)er'? = 1, 
(39) onpa(r)e™2 + (1 — ar)yo(r)e"? = 1. 


Solving these two linear equations we obtain ¥(r) and y(r). The character- 
istic function (7) is given by 


¥(7) = afa(r) + (1 — a)ye(7). 


As an illustration we shall determine ¥(7), Y2(7) and ¥(7r) when z has a normal 
distribution with mean » and standard deviation o. We have 


2 
—log g(t) = —ut — 3 (=r. 








290 ABRAHAM WALD 





Hence 





on ) = chk Vib = Boh 


o - 
Putting e* = A and e’ = B we obtain from (39) and (40) 
(41) any (1) Amwett+ile2V/ut—terr 4 (] — c) Wo(r) B-#/e2+1/02/ ut Bete as 
(42) ons (1) A-#/e?—e2V/ut—202r + (1 — a) W2(7) B-#!/ e210? /ut—2e8s ak, 




















These two equations are valid for any imaginary value of 7. Since ho = — 
Co 
we obtain from (35) 
1 eens Boule? 
(43) = {=tlet pte? 
Let 


1 
(44) n= — +5 Vi = dor, 


o 




















and 
(45) g2 = ~5 - + V 2 — 2o0?r. 


Then we obtain from (41) and (42) 





Be? — B" 
(46) op (7) ~ A® Bz — 4% RB! r 


and 


Am — 492 
(47) (1 — a)¥2(7) = A" pt An Be 


Hence the characteristic function of n is given by 
A” + B® sa A”? cas B"™ 


(48) Vr) = ape aap 











6. The distribution of nm when z is normally distributed. (a) The case when 
a is finite and b = —~«. In this case the characteristic function of n is given 
by (33). Let 


2 
bu 
Then the characteristic function of m is given by 


(50) v*() = elven, 













CUMULATIVE SUMS 
where 
(51) 
The distribution of m is given by 
(52) a © el—Vi=D—mt 
Let 


(53) G(c, m) = sof. e eV i= t—me dt, 


\ con 1 LC alin —ev/i—t—mt 
PO ht oa" dt. 


a —vVizt-mt Lf ef —eV/i=i—mt 
ales - § m)e 


we have 


(56) 5 Hc, m) — mG(c, m) = age —c/1—t—mt at 


ols 


From (53) and (54) we obtain 


(57) —s + G(c, m) = 0. 


From (56) and (57) it follows that 


= 0. 


(58) S H(c, m) + m em) 
2 dc 


Hence 


(59) log H(c, m) = — oa + log (m) 


where \(m) is some function of m only. Thus 
(60) H(c, m) = X(m)e*"*". 
Now we shall determine \(m). We have 


(61) Mm) = HO, m) = + [- aes. 





292 ABRAHAM WALD 


Since (1 — 1)” is the characteristic function of 4x” where x’ has the x*-distriby. 
tion with one degree of freedom, the right hand side of (61) is equal to 


1 —m 
TG)Vm* * 


Hence 


1 
(62) A(m) = TG@)Vm e's 


From (60) and (61) we obtain 


—c2/4m—m 


1 
(63) H(c, m) = r@Vm° 
From (56) and (63) we obtain 


nas c —c2/4m—m 
(64) G(c, m) = arGymie e : 


Hence the distribution of m is given by 


(65) F(m) dm = a5 qynan @ 0 dm, (0<m<-). 


Let m = 5 m*. Then the distribution of m* is given by 
_e/2 


D(m*) dm* = HAV /e\®, an 
(66) ar(3)(3) 


_eawrers dm* 


ee ve _ gg (el2) A/m*+m*—2) dm* 
V 2a (mt) : 
Boo ‘ : 
The function = + m* — 2is non-negative and is equal to zero only when m* = 1, 
If c is large, then D(m*) is exceedingly small for values of m* not close to 1. 


; : 1 — , ; 
ixpanding — + m* — 2 ina Taylor series around m* = 1, we obtain 
m 


(67) - + m* — 2 = (m* — 1)’ + higher order terms. 


Hence for large c 


= D(m*) dm* ~ Me @7 (el2)(ne—)? dm*, 
V2 


v 


i.e., if c is large m* is nearly normally distributed with mean equal to 1 and 


1 
standard deviation we ‘ 





CUMULATIVE SUMS 293 


(b). The case when a and b both are finite. In this case the characteristic function 
of n is given by (48). Let 
2 
o =. 
~ Doe” 


m and d 


Then the characteristic function of m is given by 


A™ + BY — A’ — B™ 
” VO = "ah B Ae 


where 
(70) h=di-vV1i-t), m=di+~wvVil1—9d, 
and ¢ is an imaginary variable. Putting.A’ = A, B* = B, da = @ and db = b, 
the characteristic function of m can be written as 
A(e-#vi-t — eavi-t) + B(ebvi-t — e-bvi-t) 
AB(e-@Vi-t — e(a-b) Vi-1) 
A(e-bvi=t — e(24-bVi-t) + B(esvi-t — e(4-2b)-Vi-2) 
ABQ — evr 

It will be sufficient to consider only the case when » > Q, since the case < 0 can 


be treated in a similar way. Then d < 0 and 6 > 0. Since the real part of 
++/1 — ¢ is greater than or equal to one, we have 


(72) | e@-Dvi- | < 1, 
for any imaginary value of ¢t. Let 

(73) T = e248) Vi-t, 
Then 


v*() = 
(71) 





(74) 


From (71) and (74) it follows that y*(¢) can be written in the form of an infinite 
series. 


L 
(75) v*() = d reivi-t, 


where \,; and r; are constants and \; > 0. Each term of this series is a character- 
istic function of the form given in (50) except for a proportionality factor. Let 
Fim) be the distribution of m corresponding to the characteristic function 
ei-iVi-t, Then F;(m) can be obtained from (65) by substituting \;forc. Since 
we may integrate the right hand side member of (75) term by term, the dis- 
tribution of m is given by 


(76) F(m) dm = fe = F(m)) dm. 





294 ABRAHAM WALD 


Since m is a discrete variable, it may seem paradoxical that we obtained g 
probability density function for m. However, the explanation lies in the fagt 
that we neglected « = Z,, — Z,, and this quantity is zero only in the limiting cage 
when uw and o approach zero. 

If | « | and o are sufficiently small as compared with a and | b | , the distriby. 
tion of m given in (76) will be a good approximation to the exact distribution of 
m, even if z is not normally distributed. The reason for this can be indicated 
as follows: Let 
(77) a= 2) 2 (i = 1,2, ---, ad inf) 

j=(i—1) r+1 
where r is a given positive integer. Since the variates z; are independently dis. 
tributed each having the same distribution, under some weak conditions the 
variates z;(i = 1, 2, --- , ad inf.) will be nearly normally distributed for large r, 
Hence, considering the cumulative sums Z; =z; +22 + --- +23 (i = 1, 2, +++, 
ad inf.), the distribution given in (76) is applicable with good approximation, 
provided that.r |u| and +/ro are small as compared with a and |b| so that the 


; * - 
difference e* = Z, — Z, can be neglected. 


7. The exact probability distribution of Z, and the exact characteristic func- 
tion of n when z can take only integral multiples of a given constant d. In the 
previous sections we derived the probability P(Z, > a) and the characteristic 
function of n under the assumption that the quantity by which Z, may differ 
from a or b is small and can be neglected. This can be done whenever | Ez| 


and Ez’ are small. However, if | Hz | or Ez’ is not small, it is desirable to derive 
the exact probability distribution of Z, and the exact characteristic function of 
n. Both are obtained in the present section for random variables z which can 
take only a finite number of integral multiples of a given constant d. This isa 
rather general result, since any distribution of z can be approximated arbitrarily 
fine by a discrete distribution of the above type if the constant d is chosen suf- 
ficiently small. 

There is no loss of generality in assuming that d = 1, since the quantity d 
can be chosen as the unit of measurement. Thus, we shall assume that z takes 
only a finite number of integral values.’ Let g; and gz be two positive integers 
such that P(z = —gi) and P(z = ge) are positive and z can take only integral 
values > —g, and < g. Denote P(z. = 7) by h;. Then the characteristic 
function of z is given by 


92 


(78) ot) = >> he™. 


t=——91 : 
To obtain the roots of the equation ¢(t) = 1, we put e’ = ~ and solve the equa- 
tion 


(79) > hvui = 1. 


t=—91 





CUMULATIVE SUMS 295 


Denote gi: + g2 by g and let the g roots of (79) be uw, ---> , uy, respectively. We 
shall assume that no two roots are equal, i.e., u; # u; fori ~ j. Substituting 
y; for e' in the identity (15) we obtain 


Denote by [a] the smallest integer > a, and by [b] the largest integer < b. Then 
Z, can take only the values 


(81) [b] —m + 1, [b] — g: + 2, --- , [b], fal, fa] +1,---, fal +g —1. 


Denote the g different integers in (81) by a, ---,¢,, respectively. Further- 
more, denote P(Z, = ci) by & .. Then equations (80) can be written as 


(82) 2 gui = 1 


Let A be the determinant value of the matrix || ui? || (7, 7 = 1, --- , g) and let 
A; be the determinant we obtain from A by substituting 1 for the elements in 
the jth column. If A ¥ 0, it follows from (82) that P(Z, = c;) = &; is given by 


(83) 


Hence, P(Z, > a) = i (4;/4) summed for all values of j for which c; > a, 


2 
From the probability distribution of Z, we can easily derive the expected value 
En of n. In fact, differentiating the fundamental identity (15) with respect to 
tat ¢ = O we obtain 


z _#0) | _ 
e . [ "— 50) "| - 
¢'(0) _ 


Since —* = Ez, we obtain from (84) 
¢(0) 


EZ 1 &c;A; 
85 En = -_ = — —, 
(85) ” Ez Ezj=-1 A 
Now we shall derive the exact characteristic function of n. Denote by 
¥i(r) (r is a purely imaginary variable) the characteristic function of the condi- 


tional distribution of n under the restriction that Z, = ¢;. Let t&i(r), «++ , (7) 
be g roots of the equation 


(86) g(t) =e'," 
such that 


(87) lim e* = u;. 


r= 


Substituting ¢;(7) for ¢ in the fundamental identity (15) we obtain 


(88) > ge OY (7) = 1 
j=l 





296 ABRAHAM WALD 


These equations are linear in the unknowns y;(r), --- , (7) and the deter- 
minant of these equations is given by 


c,t4(r) Cgty(r) | 
fe a ee 


ge Cg te(r) 


Ee 
(89) 6(r) = ; 


c1tg(r) Cgtg(r) 
bert 16. gecete 


Obviously, 6(0) = ff --- &A. Hence if &; ~ 0 (@ = 1,---,g) andA # 0, 
also 6(0) ¥ 0 and consequently 6(r) ¥ 0 for any 7 with sufficiently small absolute 
value. Thus, ¥(7), --- , ¥p(7) can be obtained by solving the linear equations 
(88). The characteristic function ¥(7) of the unconditional distribution of n 
is given by 


(90) $0) « > Evi(r). 





SOME IMPROVEMENTS IN WEIGHING AND OTHER EXPERIMENTAL 
TECHNIQUES’ 


By Haro.tp Hore.iinc 


Columbia University 


When several quantities are to be ascertained there is frequently an oppor- 
tunity to increase the accuracy and reduce the cost by combining suitably in one 
experiment what might ordinarily be considered separate operations. The 
theory of design of experiments developed as a branch of modern mathematical 
statistics, and of which fundamental considerations are set forth in R. A. Fisher’s 
book [1], provides many improvements of this kind. Since the main interests 
of Fisher and other originators of this theory have been in biology, the applica- 
tions so far made have been chiefly biological in character, excepting for certain 
economic and social investigations involving stratified sampling. The possi- 
bilities of improvement of physical and chemical investigations through designed 
experiments based on the theory of statistical inference have scarcely begun to 
be explored. 

The following example is due to F. Yates [2]. A chemist has seven light ob- 
jects to weigh, and the scale also requires a zero correction, so that eight weigh- 
ings are necessary. The standard error of each weighing is denoted by a, the 
variance therefore by o°. Since the weight assigned to each object by customary 
techniques is the difference between the reading of the scale when carrying that 
object and when empty, the variance of the assigned weight is 20”, and its stand- 
ard error is o \/2. 

The improved technique suggested by Yates consists of weighing all seven 
objects together, and also weighing them in groups of three so chosen that each 
object is weighed four times altogether, twice with any other object and twice 
without it. Calling the readings from the scale y;, --- , ys we then have as 
equations for determining the unknown weights a, b, --- , g, 


atb+c+d+e+ft+g=n 
a+b+c Y2 
a +d+e Y3 
a +f+tg=m% 
+d +7 Ys 

+e +9 =¥% 

c+d +9 Y7 

c +e+f = Ys. 


1 Presented at the Wellesley meeting of the Institute of Mathematical Statistics, Aug. 
13, 1944. 


297 





298 HAROLD HOTELLING 


Any particular weight is found by adding together the four equations containing 
it, subtracting the other four, and dividing by 4. Thus 


g =e Mtnwtytm— Ys — Y— YW — Ys 
aE ; 


The variance of a sum of independent observations is the sum of their variances, 
as is well known, and the variance of c times an observation is c’ times the 
variance of that observation. Taking c = } for the first four terms in the expres- 
sion for a and c = —} for the others gives for the variance of a by this method 
a /2, which is only one-fourth that for the direct method. The standard error, 
or probable error, has been halved. If a degree of accuracy is required calling 
for repetition a certain number of times of the weighings by the direct method, 
then only one-fourth as many weighings are needed by Yates’ method to procure 
the same accuracy in the average. 

A further improvement, which does not seem to have been mentioned in the 
literature, will be obtained if Yates’ procedure is modified by placing in the other 
pan of the scale those of the objects not included in one of his weighings. Calling 
the readings in this case 2, ---+ , 23, we have 


at+tb+c+d+e+ft+g= 
a+b+c-—d-—e-f-g 
a—-b-—-c+d+e-f-g 
a—-b-—c-—d-e+ft+g 
—-a+b-—c+d-—e+f-—-g 
—-a+b-—-c-—-d+e-ft+g 
—-a-—-b+c+d-—e-f+g= 
—-a-b+c-—-d+e+f—-g= 


From these equations, 


_aAtwatrwet ma — %— % —% — % 
ee ’ 


with a like expression for each of the other unknowns. The variance of each 
unknown by this method is o°/8. The standard error is half that by Yates’ 
method, or a quarter of its value by the direct method of weighing each object 
separately. The number of repetitions required to procure a particular standard 
error in the mean is one-sixteenth that by the direct method. 

A simpler example illustrating the same point is that of two objects to be 
weighed, with a scale already corrected for bias. Again let o” be the variance 
of an individual weighing. If we weigh the two objects together in one pan of 
the scale, and then in opposite pans, we have as equations for the unknown 
weights a and b, 


a+b=2%, a—b=z, 





WEIGHING 


whence 
a= (24 + 22) /2, b= (2; —= 22) /2. 


The variances of a and b by this method are both equal to o’/2, half the value 
when the two objects are weighed separately. The means found from a number 
of pairs of weighings of sums and differences have the same precision as those 
found from twice as many pairs of weighings of the objects separately. 

Further economies of effort, or gains in accuracy, are possible with larger 
numbers of weighings and of objects to be weighed. These improvements can 
to some extent be applied also to other types of measurement, as of distances, 
since it is sometimes possible to measure the sum of a number of such quantities, 
or the difference between two such sums, with approximately the same accuracy 
as a single one of them. The outstanding case, however, seems to be that of 
weighing on a balance objects light enough so that their aggregate weight is 
below the maximum for which the balance was designed, since in this case it is 
quite reasonable to assume that the several recorded results all have the same 
standard error o and that they are independent. 

In what follows, some principles underlying the design of efficient schemes of 
this kind will be developed and applied to obtain some additional plans. How- 
ever no comprehensive general solution has been reached; this appears to be a 
matter for further mathematical research. Also, we leave aside in this paper 
the problem of estimating the error variance. All this discussion is based on 
the minimization of the actual variance. In order to utilize the results it is 
necessary that this variance be either known a priori or estimated from the 
residuals from the least-square solution. The latter type of estimate is in some 
ways more satisfactory, since it refers to the actual experiment rather than to 
some previous experiments which may not have been made under exactly the 
same conditions. But in order to have such an estimate it is necessary that the 
number of observations exceed the number of unknowns, and desirable that the 
excess shall have a large enough value to insure a stable estimate of the error 
variance o. The appropriate test for significance, or determination of confidence 
limits for the unknowns, must then utilize the Student distribution or its general- 
ization, the variance ratio distribution, which take full account of the instability 
caused by an inadequate number of degrees of freedom for estimating co. 

It is only when o is known exactly apart from the experiment being designed 
that the criteria we here consider are exactly applicable. In other cases there 
may need to be a balancing, in the design of the experiment, between the de- 
siderata of minimum variance and of accurately known variance, with the accu- 
racy of this knowledge depending on the number of available degrees of freedom. 
A theory of design taking full account of this consideration would require a use 
of the power functions of the Student distribution and the variance ratio dis- 
tribution, discovered respectively by R. A. Fisher [3] and P. C. Tang [4]. 

We shall denote by N the number of weighings to be made, and by p the num- 
ber of objects to be weighed. In order that it be possible to determine the un- 





300 HAROLD HOTELLING 


known weights from the observations it is necessary that p < N, and if a possible 
bias in the scale must be eliminated by means of the same data it is necessary 
that p S N — 1. Supposing these conditions to be satisfied, we shall show, 
among other things, that the minimum possible variance for one of the un- 
knowns is o /N; that the experiment may be arranged so that a selected one of 
the unknowns has exactly this minimum variance excepting when JN is odd and 
a bias must be allowed for also; and that for some, but not all, combinations of 
p and N, this minimum variance is attained for all the unknowns simultaneously, 
This minimum value o’/N is of course equal to the variance of the mean of NV 
weighings of one object alone, disregarding the rest; but it will be seen below 
that by complex experiments of the kind indicated, determinations from the 
same number of weighings of the other weights also can at the same time be 
made with some finite variance, which may or may not have the minimum value, 

The following notation will be used in the proof. Let zi. = 1 or —1 if the 
ith object is included in the ath weighing by being placed respectively in the 
left- or right-hand pan, and let v;. = Oif the ith object is not included in the ath 
weighing. Here i = 1, 2,---, panda = 1,---,N. Let ya be the result 
recorded for the ath weighing, let A. be the error in this result, and let b; be the 
true weight of the 7th object, so that we have the N equations 


(1) Ladi + Leabe + +++ + Lpadp = Ya + Aa, 


provided there is no bias, or if by yz we mean the observed weight corrected for 
a bias known a priori. Under these conditions the estimate of each of the },’s 
having the properties of zero bias and minimum variance is that provided by 
the method of least squares. This statement, which does not depend on any 
assumption of a normal or other particular form of distribution of the errors, 
has been known long but not widely, since there is an easier derivation of the 
method by the application to the normal distribution to the method of maximum 
likelihood. Its proof, due originally to Laplace, has appeared in many forms 
in the work of Gauss and later authors [5]; the latest version is by the present 
writer [6]. 
Letting S stand for summation over all the N weighings we put 


(2) Qi; = Stietja, gi = SriaYa, 
and write the normal equations in the form 
Lai jb; = gi, 
where > stands for a sum with respect to j from 1 to p. From the usual theory 


of least. squares (cf. for example the reference last cited) it is known that the 
standard error of the determination of 6; from these equations—which is the 





WEIGHING 301 


minimum possible standard error of b; for any way of combining the observa- 
tions—is o times the square root of Ay,/A, where 


ja Qj +++ ip 


Qo, Go2 *** 
A= ° 


|}Qpi Apo *°* App 


and A; is the minor of A obtained by deleting the first row and column. 

The matrices of A and of Ay are known to be positive definite or semi-definite. 
The semi-definite case is excluded by the consideration that the normal equa- 
tions shall actually determine the unknowns. Hence the inverse of the latter 
matrix exists and is positive definite. But this inverse, which we may write 


doe ae doy 


de - he 


consists of the coefficients in the identity 


RP 


A/An = Aa1— i dja aj ° 
i,j=2 
which is obtained by expanding A with reference to its first row and first column. 
The positive definite character of d therefore leads to the following 

Lemma: If ay, +++ , Gip (= Ga, +++ , Gp respectively) are free to vary while the 
other elements of A remain fixed, the maximum value of A/A1 ts ay , and is attained 
when and only when ay = a3 = ++: = dy = 0. 

From this it is evident that the variance of b; , namely o”Ay,/A, cannot be less 
than o’/ay, , and will reach this value only if the experiment is so arranged that 
the elements after the first in the first row and column of A are all zero. That 
such an arrangement is possible may be seen by a consideration of the matrix 


21 “90 Vp 


whose elements are restricted to be 1’s, —1’s and 0’s. The condition ay, = 0, 
by (2), means simply that Szxi2%12 = 0, a condition which may be expressed by 
saying that the first column of X is orthogonal to the ith column. The condi- 
tion that the variance of b, have its minimum value o°/ay is thus, according to 
the lemma, that the first column of X shall be orthogonal to all the others. The 
minimum minimorum of this variance will be reached if the first row of X is 





302 HAROLD HOTELLING 


not only orthogonal to all the others, but consists entirely of 1’s and —1’s, so 
that a, = N. The value of this minimum minimorum is o°/N. 

If there is a possible bias bo this procedure needs to be modified by the addition 
of bo to the left member of (1) and subsequent treatment of this term like the 
others, putting xo. = 1 in (2), and modifying X by adjoining a column of 1’s, 
The necessary and sufficient condition that the variance of b; shall equal o’/N 
is then that the column 


X12 


Tin 


shall consist entirely of 1’s and —1’s and shall be orthogonal to a column con- 
sisting of 1’s, and to all the other columns of X. 

If no bias needs to be eliminated the experiment can be arranged so that the 
variance of b, is o /N merely by filling up the first column of X with 1’s and 
—1’s in any arbitrary manner, and then choosing the later columns so as to be 
orthogonal to this first one, and so that all are linearly independent. This can 
be accomplished, for example by choosing the first element in all the columns to 
be the same as that in the first column; choosing the ith element in the 7th column 
(i = 2, 3,---, p) to be the negative of the 7th element in the first column; and 
making all the other elements of X zero. 

When a bias is to be eliminated, so that there is a column of 1’s in X corre- 
sponding to by, it is necessary that N be even in order that the column of X 
corresponding to b; may consist of 1’s and —1’s in equal numbers, without any 
0’s, a condition essential for the orthogonality between these two columns with 
the maximum value N for a. Supposing N even, let us assign the value 1 to 
each of the first N/2 elements of the column corresponding to b; and the value 
—1 to the last N/2 elements of this row. The remaining rows of X may then 
be filled up by the same method as that indicated above for the case in which 
there is no bias. The variance of b, will then take its theoretical minimum 
value o/N. 

If N is odd and there is a possible bias, the column of X corresponding to }; 
can be filled up with 1’s and —1’s in equal numbers, with a single zero, and the 
remaining columns can be made orthogonal to it. The variance of b; in this 
case will be o /(N — 1). 

The method suggested above for filling up the later columns of X is convenient 
for the proof, but is not usually to be recommended in practice, since other 
methods will in all but the simplest cases give smaller standard errors for the 
unknowns other than the first. For some values of N and p it is possible to 
determine all the unknowns with equal and minimum variance. These are the 
cases in which all the columns of X can be made mutually orthogonal and 





WEIGHING 303 


without zeros, excepting that the column corresponding to b) may contain some 
zeros. Thus for N = 4 the scheme of weighing represented by the matrix 


1 1 1 
1 lL =] 
1-1 1 
1-1 —1 1 


x = 


whose columns are all mutually orthogonal, may be applied to weigh three ob- 
jects when there is a possible bias, or four where there is not, with variance 
o¢ /4 for each of the unknowns in either case. The matrix X’X of the normal 
equations has the form 

4 

QO}. 


0 

+ 
Calling the results of the weighings yo , yi , y2 , yz in the case of possible bias we 
have for the unknowns the expressions 


by = (Yo + yi — Yo — Ys)/4 
bo = (Yo — Yi + Y2 — Ys)/4 
bs = (Yo — Yi — Yo + Ys)/4. 


The complete orthogonality exemplified by this design has several advantages 
besides the fact that the variance of each of the unknowns has the same mini- 
mum value as if all the weighings were to be devoted to it alone (or half the value 
of the variance of this unknown if half the weighings were devoted to it plus 
bias and half to determining the bias). The diagonal form of the matrix X’X 
means that the labor of solving normal equations, which is sometimes formidable, 
is reduced to the trivial task of dividing by N. Also, the diagonal form of this 
matrix implies that its inverse is also of diagonal form, from which it follows 
that the estimates of the different unknowns are statistically independent. Con- 
sequently the variances, or standard errors, of linear functions of the unknowns 
are easy to find. Thus the variance of the difference between the estimates of 
two of the weights is simply the sum of their variances. But of course if the 
main object of the experiment is to determine a particular difference of this 
kind, or any other linear function of the weights, a different design should be 
sought to minimize the particular variance which is of interest. 

In contrast to the satisfactory design possible with four weighings, no complete 
orthogonality is possible with six weighings, or with any odd number, if the 
number of objects to be weighed is the maximum possible for the number of 





304 HAROLD HOTELLING 


weighings and if each object is actually to enter into each weighing in one pan 
or the other. For N = 3 and bias known to be zero consider the scheme 


which corresponds to weighing two objects, first together in one pan, then in 
opposite pans, and then weighing one alone. Calling b; the weight of the object 
that has been on the scale through all three weighings and b. the other we have 
the estimates 


b = (Yi + ye + ys)/3 
be = (y — y2)/2, 
with respective variances 
oi = 0/3, a; = 0/2. 


Thus the first weight is determined with the minimum possible variance but the 
second is not. 

An alternative method of weighing under these same conditions is to weigh 
both objects in one pan together twice and to weigh them in opposite pans once, 
This gives 


with the normal equations 
3b, + bo = pit yet Ys 
bi + dbo = i + ye — Ys, 
whose solution is 
(yr + y2 + 2ys)/4 
(yi + y2 — 2ys)/4, 


and variances 


’ of = 02 = 3o’. 


Thus the weights are by this method determined with equal accuracy, which is 
better than by the preceding method for one of the objects but worse for the 
other. To choose between the two methods it is therefore appropriate to take 
into consideration the relative accuracy desired in the weights of the two objects. 
Either method is better than weighing the objects separately. 

Either of these two X matrices can also be made the basis for weighing a single 


























WEIGHING 305 





object when the scale is suspected of having a bias. The weight of this object 
will be estimated as b. , and will have the variance }c° by the first method, or 
3¢° by the second. Thus the second method is distinctly superior in this case. 

Orthogonality between columns obviously requires both negative and positive 
signs, corresponding to weighings in both pans of the balance. Thus the ex- 
perimental designs of maximum efficiency for weighing on a balance are not 
available with a spring scale, or in making measurements of any kind in which 
it is not possible to arrange that the quantities read off are differences. In such 
eases the elements of X are restricted to be 1 or 0. Let us now consider some of 
the simplest cases of this kind, assuming for simplicity that ¢ = 1. We shall 
deal only with cases in which there is no bias. 

For N = 3, p = 2 the simple experiment of weighing one object twice and the 
other once yields variances } and 1 respectively. All other designs are in this 
ease less satisfactory, with the possible exception of that specified by 


fy 4 


x=|1 oO], 
01 


with bh = (yi: + 2y2 — ys)/3 and be = (y: — Ye + 2ys)/3 having each the vari- 
ance of 3. 
For N = 3, p = 3 the most efficient design is given by 


1 10 
X=]1 0 1], 
011 























with b} = (y1 + ye — Yys)/2, and be and bs; given by cyclic permutation in this 
formula. The variance of each unknown is 3. 

For N = 4, p = 3 a design having an advantage in some situations is that 
given by 











x= 





(together of course with those obtained by permutations of rows and of columns, 
as is to be understood throughout). The normal equations are 


3b; + 2be + 2b3 = yi + Yo + Ys 
2b; + 3be + 2b3 = yo + Ys + Ys 
2b; + 2be + 3b3 = yi + Ys + M- 


An expeditious method of solution in this as in many similar cases is to add them I 
all together and then subtract an appropriate multiple of the sum from each 
of the normal equations in turn. The variance of each unknown found by this 


h is 
the 
ake 
cts. 














agle 





306 HAROLD HOTELLING 


experiment is 5/7 = .714. The simple experiment consisting of weighing one of 
the objects. twice and the others once each yields variances in one case larger 
and in two cases smaller than this. 

For N = p = 4 the cyclic arrangement 


1 


1 
1 
0 
1 


leads to variances all equal to 7/9. 
For N = 5, p = 2 the most efficient design appears to be 


0 


The variance of each unknown is in this case 1/3. 


REFERENCES 


[1] R. A. FisHer, The Design of Experiments, Oliver and Boyd, London and Edinburgh, 
1935 and 1937. 

[2] F. Yates, ‘‘Complex experiments,” Jour. Roy. Stat. Soc. Supp., Vol. 2 (1935), pp. 181- 
247, including discussion. The reference is to p. 211. 

[3] R. A. Fisner, ‘‘The sampling error of estimated deviates... ,’? Mathematical Tables, 
Vol. 1 (1931), pp. xxvi-xxxv. London, British Association for the Advancement 
of science. 

[4] P. C. Tana, ‘The power function of the analysis of variance tests with tables and 
illustrations of their use,’’ Stat. Res. Mem., Vol. 2 (1939), pp. 126-149. 

[5] Cf. WuitTaKER and Rosinson, Calculus of Observations, Section 115. 

[6] Haroutp Hore uine, ‘‘Problems of prediction, ’? Am. Jour. of Sociology, Vol. 48 (1942), 
pp. 61-76. 





irgh, 
181- 


ables, 
ment 


3 and 


1942), 


ON THE ANALYSIS OF A CERTAIN SIX-BY-SIX FOUR-GROUP 
LATTICE DESIGN! 


By Boyp HARsHBARGER? 


Virginia Agricultural Experiment Station 


1. Introduction. ‘The lattice consists of groups of randomized incomplete 
blocks with certain restrictions being imposed on the randomization within each 
group, and the number of varieties is a perfect square. For example, if the num- 
ber of varieties is k” = 36, then the orthogonal groups for a triple lattice, not 
considering randomizing within the blocks or between blocks, are as follows: 
(the numbers signify varieties). 


GROUP X GROUP Y 
Blocks Blocks 
(1) Seeses« (1) 1 #7 13 19 
(2) 8 9 10 11 12 (2) 8 14 20 
(3) 14 15 16 17 18 (3) 3 9 15 21 
(4) 20 21 22 23 24 (4) 10 16 22 
(5) 26 27 28 29 30 (5) 5 ll 17 28 
(6) 32 33 34 35 36 (6) 6 12 18 § 


GROUP Z 
Blocks 
(1) 8 15 22 29 
(2) 9 16 23 30 
(3) 10 17 24 25 
(4) 11 18 19 26 
(5) 12 13 20 27 
(6) 6 7 14 21 28 


This design is constructed so that no variety appears with another variety 
more than once in the same block. This important characteristic makes the 
analysis simple, as it enables the results to be treated as a factorial design. The 
analysis is well described by Yates [8, 4, 5] and Cochran [1]. 

Suppose another group, U, is now formed from a six by six lattice, for example, 
the following group: 


1 Certain of the ideas presented here are embodied in the author’s unpublished doctoral 
thesis by the same title, Library, George Washington University, Washington, D. C., 1943. 

* The author wishes to express his appreciation to W. G. Cochran of Iowa State College, 
who advised freely in the preparation of the origiral thesis, and to Frank M. Weida of 
George Washington University. 


307 





BOYD HARSHBARGER 


GROUP U 
Blocks 
(1) 26 21 16 11 6 
(2) 1 32 27 22 17 12 
(3) 7 2 33 28 23 18 
(4) 13 8 3 34 29 24 
(5) 19 14 9 4 35 30 
(6) 25 20 15 10 5 36 


The important characteristic, that no variety appears with another variety in 
the same block more than once, does not hold. For example, varieties 1 and 22 
appear together in both groups Z and U. 

It is the purpose of this paper to develop the statistical method for the analysis 
of such a design, where each group is duplicated, and to apply the results to an 
actual problem. The least square solution, as developed here, uses only the 
intra-block information to correct the varieties for block effects. In a second 
article the solution using both intra- and inter-block information will be given. 


2. Estimation of the block and varietal effects. It is reasonable to assume 
in varietal trials that the general mean, and any particular block and variety 
effects, operate additively to produce the true mean of y associated with this 
block and variety. In particular, if y.:; is the yield of the plot for the jth variety 
in the ith block of the eth replicate ,the following hypothesis may be set up, 
namely: 

(1) Yeij = B+ pe + Bei + v5 + ci; 

Where x is the true or population mean yield, p, is the population replicate effect 
of the eth replicate, 8.; is the population block effect of the ith block in the eth 
replicate, v; is the population variety effect of the jth variety, and e,;; is the ex- 
perimental error of the ezj plot. Since the design has eight replicates, that is, 
each group is duplicated,the block effects may be estimated from unpaired and 
paired replicates or partners. 

It is assumed that the e,;; are independently and normally distributed with 
common variance a . Without loss of generality, it also may be assumed that 
the sum of the replicate effects, the sum of the block effects within any replica- 
tion, and the sum of the variety effects are respectively equal to zero. 

The parameters are estimated by the method of least squares, subject to the 
restrictions stated in the preceding paragraph. This involves choosing the 
parameters so that 


( b : : bei + bes ) 
S{ Yerj — M1 — Tele — * Maes — Res — 055 j 
(2) 


: Sy St et — bes + bes + be 1 > 
+s Dire + 2s dee De — + rie 2 + Oe +r», Vi 


2 j=l 


t=] 2 ex] 





ANALYSIS OF GROUP LATTICE DESIGN 309 


is a minimum.’ Here y,.;; is the dependent variate, and 2, --- , 25; are the 
independent variates. In ordinary regression problems, the values of the x 
variates, as well as the y variate, constitute a part of the original data. How- 
ever, in this case the y variate only is given, and the x variates must be con- 
structed. Thus, for the design, one takes x2; = 1 for all values; x2. = 1 for all 
values in replicate e, but zero elsewhere; x3,; = 1 for all values in the ith block 
of the eth replicate and —1 for all values in its partner, but zero elsewhere; 
tei = 1 for all values in the 7th block of the eth replicate and also 1 in its partner, 
but zero elsewhere; and 25; = 1 where variety j occurs, but zero elsewhere. 

One now takes the partial derivatives of the above equation with respect to 
the parameters and forms the normal equations. It can be shown that 
(4, °°* , As) are each zero. The normal equations not involving }’s are: 


Leading term 


8 8 6 
m Nm + Dre + 2D 


e=1 e t=1 2 


b.. 


ey 


1 


8 
+ 2k 2d 
36 


km + Bre + kD ba + D0 = R,. 


j=l , 


k(re — 11) + 2k a = By — Bi. 


Equations having bes > 


as leading terms. 


Equations having v; as leading terms. 


In the above, N is the total number of values, k is the number of plots in a 
block, r, is the eth replicate effect, b.; is the ith block effect in the eth replicate, 
v; is the jth variety effect, G is the total sum of all values, R, is the sum of the 
values in the eth replicate, B,; is the sum of values in the ith block of the eth 
replicate and v; is the sum of the yields of the jth variety. The primes denote 
similar values of the partner terms. 

Using the restrictions 


(4 i 


the values of the following parameters are directly obtainable: 


_R._G and bi — bs Bi —Ba _ nm — 1% 
i N 20 Ok a 





3 §, for summation, will be used to represent summation over all values. will be used 
in a more restricted sense. 





310 BOYD HARSHBARGER 


The values of the and v; effects cannot be obtained directly. In 


bes + dei 
2 


order to simplify the solution of these equations, only the mean, confounded 
blocks, and the varietal effects will be used. The results later will be corrected 
for replicate effects. If each of the yields is added to the corresponding yield 
in its partner, one gets an equation of the form 


(6) Yx =~ w+ Bi + m+ ee 


where yp, 6; , and », are now twice their original values. These parameters are 
estimated by the method of least squares subject to the restrictions previously 
given. To distinguish them from the estimates derived in equation (3) they are 
designated with double primes (’’). If Biz is the total for block 7 in group X, 
that is from both pairs, and similarly for the other groups, the normal equations 
are: 


Leading term 


m” ] oo , 


be 6m" + ba + (vt + v2 ia v3, + ve +05 7 0) 


” 
bus 6m" + 6b" . (ol + Vi0 r V46 a v2 + 095 ‘ if) 
I 


V1 4m” + bar a byt + bi + bus + 4vy Vi. 
36 4m!” + bis + bye + bir + Dus + vss = Vz. 
Let 7,: be the total of all the varieties appearing in block 1 of group X from all 


the replicates; T',; the total of all values appearing in block 7 of group Y from all 
the replicates, etc. Also 


(8) Cr: = 4B,; — T:; Cy: = 4By: — Ty; ete. 


Solving the equations: 
ba; = Ces by = 


18 


18 


and 
” 


bir = zeal(25Cn + Crs + Cas) + 3(Cu2 + Cus + Cus)] 
(9) a4a1(25C 2 * “ * — - es ™ ~ al Ca)! 


2/ 


226 al(25C un ‘ Cus + Cus) + 3(Ca + Ca Pe Cul. 


The values of b’”’s calculated as above contain the replicate effects. To cor- 
rect for this, adjust the values so that the sum of the block effects for each 
replicate is zero. From the corrected b’’ values and the normal equations with 
the v’”’s as leading terms, the corrected varietal sums are calculated. These are: 


” ‘ sail . ‘ 
(10) 40; + 4m” = V; — Xb,; (sum over blocks in which v; appears) 


where 4v;, + 4m” is the corrected varietal total for the jth variety. 

















In 
ded 
ted 
ield 





are 
usly 
r are 
> X, 
jlons 


ym all 
ym all 


lo cor- 
yr each 
ns with 
ese are: 


) 


(11) 


(12) 


+22.—-. 


e=x1] i=l 


, , 2 
+ s(y — Mr — TeX — bet — bes La — bei + Des Iu — ss2u) ; 


In the equation (11) 


ANALYSIS OF GROUP LATTICE DESIGN 311 


3. Test of significance and the analysis of variance. If the 2x’s have the 
previously defined values, the following identity occurs: 


mG + Yk. +EE&- Ma, - Bis) 


e=l] i=l 


aS be) (B+ BY) + i 0V 


8 6 


mG + Yr, + Sb Oa = be) 


ex=l1 i=l 


is the reduction in the sum of squares due to regression and 


(13) s(y — MX, — TeXre — 


is the residual sum of squares. 


and 


(14) 


1,02, °°° 








, U36 are all zero. 


8 6 


hee dee 


ex] i=l 


> (bet t bes (be . b 


exl i=l 


will be designated as component (a) and component (b) respectively. The 
residual mean square will be denoted by s’. 

The common test required is that of the null hypothesis that the variety effects 
This test is made by calculating the reduction (R,) 
to the sum of squares on all variates, and the reduction (R,) due to the regression 
on all variates, except the variety effects. R, — R, is called the additional 
reduction to the sum of squares due to the v’s after fitting the remaining variates. 

The ratio (R, — R,)/(82 — 47)s° is distributed as F, as shown by Yates (6), 
with 35 and 205 degrees of freedom. The 35, 205, 82, and 47 degrees of freedom 
pertain respectively to varieties, error, all constants fitted, and the total constants 


less the constants for varieties. 


Referring to formula (11) and the parameter effects, the sum of squares in the 
“Analysis of Variance Table” follow directly for replicate and component (a). 
Nair’ in his recent article gives in detail the method for getting out the reduction 


— e : 
bes — Bes (bei . bes) ) . 


(bes — bis) , 
~_— (Bai vv B.i) 








a —s bi) (Be B,) 


+¥y bat . be) (B+ BL) + > %4V5, 


e=1 i=l 






ns 3i —— Lag — Vj5Xs5 
2 


The reductions attributable to 






















*) (Bu + Blo), 



























































312 BOYD HARSHBARGER 


to the sum of squares for the entangled components. He shows that the reduc- 
tions for component (b) and the varieties may be written as: 


1 u 6 36 ys G 
15 = i C.i + — — 
(15) 3 du 2d b ow! and | 2, 3 N 


where the b”’ have been corrected for replicate effects. It is well to note that 


u 6 
ur . . . . 
- >, » b.; C2: is the reduction due to intra-block effects, freed of varietal effects, 


e=z i=1 


The reduction due to varieties corrected for block effects is given by 


(16) Po yee + (HU - ? “)- ap ‘©. 


e=z i=l e=z i=l 


This reduction can be used for testing the variety effects. 


ANALYSIS OF VARIANCE TABLE 
D/F 


Replicate (8-1) 

., « — Bu)’ _ 5 (Re — Ry 

Component (a) 4(6-1) nate — 
e=1 


Component (b) 4(6-1) 
Varieties 
(ignoring blocks) (36-1) 


Error 205 obtained by subtraction 
G 
N 

4. Standard error of adjusted varietal means. For obtaining the standard 
error of the difference between two varieties adjusted by the intra-block informa- 
tion, this difference between two varicties can be expressed as a linear function 
of the plot yields. The standard error of the difference then can be obtained 
from the standard error of a linear function. To obtain the coefficients, it is 
well to draw a sketch of the plots, and put the coefficient of each plot on the 
diagram. In this way the proper multipliers can be found in a convenient 
manner. 

First consider the case where the two varieties appear together in the same 
block in both groups Z and U. One such pair consists of varieties (1) and (22), 
for which the varietal effects are designated by v; and v:. From equation (10) 
we have: 


(17) Avy Vi — 4m” — c = he = bes — fe 


” 


din = Va — 4m” —~ & — & — 0 — Us. 


Total 8(36)-1 Sy’ — 
















dard 
rma- 
ction 
ined 

it is 
n the 
nient 


same 
(22), 
1 (10) 


ANALYSIS OF GROUP LATTICE DESIGN 


The linear function of the difference between the varietal effects is: 
(18) 4(v2. — v1) = Ve — Vit bir — dea + dy — Dy 
where 


” 1 ” i 
bat _— 3k (4Bu a T 1) bes = 3k (4B. a T«) 


os 
byt SE 


The multipliers [except for the common factor 4 shown on the left of equation 
(18)] are: 


” 1 
(4B, vo Ty) bys = 3k (4By4 —_ Ty). 





















Number of Plots Multipliers Contribution to variance 
4 4 3k +2 36k” + 48k + 16 
3k W (9k?) 
2 i 
i 3k -2 36k — 48k + 16 
3k W (9k?) 
4 64 
' ~ 3k WOR) 
3 36(k — 2) 
ss * 3k WOR) 
. 1 12(k — 2) 
12(k + 2) +35 WOR) 
72k’ + 48k 
—_. WOR) 


The variance per plot of the difference between two varietal means for varieties 
which occur together in the same block in groups Z and U is: 


(19) 72k + 48k _ 3k +2 
2W(16)(9k) 12Wk 





, i 
and for k = 6 is sw’ 


Similarly the variance per plot of the difference between varietal means for 
other combinations are as follows: 





Combinations formula for k = 6 


both in same block in 7 
groups Z or U 24W 
both in same block in 8 

groups X or Y 27W 






not together in the 67 
same block 216W 


314 BOYD HARSHBARGER 


5. Numerical Analysis. (a) The data. In order to illustrate the application 
of the method developed, an experiment used to test the yields of 36 hybrid corn 
varieties is presented. This experiment was carried out on the Arlington Ex- 
perimental Farm, and the results are used through the courtesy of A. E. Brandt‘ 
and M. H. Jenkins.” 

Except for randomization, the plot yields are as shown in tables I to IV. 

(b). Calculation for analysis of variance table. From page 9, the total sum of 
squares, the sum of squares for replicates, and the sum of squares for varieties 
ignoring blocks are obtained by substitution. They are: 


(90.9)? + (81.4)? + --- + (101.0) — c = 33,546.92 


= 2,289. 
36 c 289.68 





2 . P 

(741.2)° + eee + (743.1) _ ok 15,825.09 

¢ = (25,935.9)" 
288 


The block sum of squares, eliminating varieties, is made up of two parts, 


component (a) and component (b). From the formula on page 9 the reduction 
for component (a) is: 


(559.2 — 540.2)* + (547.0 — 522.4)? + --- + (515.8 — 507.7)” 
12 


(3291.8 — 3300.2)” + (3256.5 — 3304.7)* 
+ --- + (3284.6 — 2978.2)? 
72 


Component (b) consists of differences giving an estimate of block yield freed of 
varietal effects. The C’s are first calculated by using formula (8) and the 
results are as follows: 


= 2,335,662.80 


= 1,415.27. 


Cu = (4Byn — Tu) = 4(1,099.4) — 4203.2 = 194.4 ete. 


The b’s are calculated by using formulas given by (9), and then correcting for 
replicate effects by imposing the conditions that 


6 
Db =0 (=2---u) 


t=] 


The corrected bes are: 
b; = 6.79556, be = —3.83777,--- bea = —0.34306. 


* Acting Chief, Conservation Experiment Station Divisions, Office of Research Soil Con- 
servation Service, U.S.D.A. 


5 Principal Agronomist. In charge of Corn Investigations, Bureau of Plant Industry, 
U.S.D.A. 





TABLE I 


GROUP X 
Plot Yield Replicate 1 


7.2) - 

















98.8; (95.5) 88.1! 110.2] 


516.0) 585.2; 545.4 584.1) 3300.3 











315 








13 


TABLE II 


GROUP Y 
Plot Yield Replicate 2 


| 25 


85.4 





Plot Yield Replicate 2’ 


79.0 | 


19 


| 


86.4 | 


20 


88.1 


21 


| 19 
85.4 | 





99.7 | 


| 25 


74.3 | 


26 


27 


88.8 




















TABLE III 
GROUP Z 
Yield Replicate 3 


22 | 29 


99.4 | 





| 30 
90.3 | 








je 
| 25 
92.3 | 


73.9 | 


z 


| 
| 


| 


31 


90.3 | 548.8 
| 
| 


94.1 | 534.4 
sa aoe 


97.2 | 540.2 





19 
125.1 


13 | 20 | 27 
73.9 S44 | 

| 21 | 28 
85.4) 95.4 | 


| 


102.1 | 


| 
| 
| 


93. 


Plot Yield Replicate 3’ 




















66.7 | 


98.1 | 599.6 
ern 
34 | 
89.8 492.8 

| 35 
101.6 | 557.5 


| 


| 


3273.3 





2 


69.0 | 525.7 
31 
103.6 | 554.4 
32 
103.4 | 591.5 
33 
98.1 | 





572.3 
86.9 | 475.4 


527.2 





3246.5 





TABLE IV 
GROUP U 
Plot Yield Replicate 4 





11 | 6 


90.9 102.5) 91.5 | 580.3 


| 12 | 
85.6 100.6 | 543.6 





| 18 | 
88.9 | 115.7 | 565.2 





| 24 | 
91.3 | | 534.0 





| ad 
545.7 





515.8 











Plot Yield Replicate 4’ 








| 16 iil | | 
84.8 | 94.5 | 6 | 84.4 | 508.7 
| 12 | 
75.8 | 83.4 | 491.4 


| 
| snapeaccctsctacreniesaa 


18 | 
94.7 | 544.2 





| 


87.6 102.1 | 456.7 











97.1 | 73.3 101.0 | 507.7 


| | 2978.2 











ANALYSIS OF GROUP LATTICE DESIGN 319 


To get the reduction due to somagnnens (b) the above results are substituted in 


= . bY. Cy; = 1,389.96. 


8 e=z i=] 
The necessary results are now available for the “‘Analysis of Variance Table.” 
THE ANALYSIS OF VARIANCE TABLE 
Source of Variation Degrees of Freedom Sum of Squares Mean Square 
Replications : 7 2,289.68 327.097 
Component (a) 20 1,415.27 
Component (b) 20 1,389.96 
Blocks (eliminating 40 2,805.23 
varieties) 
Varieties (ignoring 35 15,825.09 
blocks) 
Error 205 12,626.92 61.595 
Total 287 33,546.92 


(c). Test of significance. There frequently will be large differences between 
varieties so that a test of significance may not be needed. If a test is needed, 
one involving only intra-block information may be used. For this purpose, it is 
necessary to calculate the sum of squares for varieties eliminating block effects 
as shown by formula (15): 13,946.28. The mean square will be 399.893, and 
F= ST aoe = 6.49 which is highly significant. 

(d). Corrected varietal totals and means. ‘The right-hand side of equation (10) 
gives the corrected variety totals, and when divided by eight gives the varietal 
means. These corrected varietal means can then be compared to determine the 
best variety. The corrected varietal totals and means are: 


Corrected Varietal Totals 





669.05 652.00 | .80 | 672.04) 720.32 





8 


9 | il | 12 
658.5 | 


739.54 | 735.95 


. 
| 
| 
A 


co. Wl a 
14 | 17 | 18 


700.44 | | 686.41; 713.21 | 730.79 | 857.26 





| 20 | | 23 | 24 
675.40 756.25 | 801.34 | 619.46 | 868.84 
| } | 
—| fo 
| 27 | 28 | 29 
699.83 | 567.71 | 814.04 ae 
| | 
| 





| 26 | 30 
763.51 | 679.44 
35 36 

780.48 760.37 


| 32 | 33 | 34 
757.48 42 | 726.05 











92.902 | 


7 
93.449 


13 


BOYD HARSHBARGER 


Corrected Varietal Means 





| 3 
83.631 
9 
82.322 


81.500 | 


83.071 | 


| 
88.225 | 84.005 90 .049 


12 
92.442 91.994 


5 


93.924 | 





| 18 
91.349 | 


89.151 | 107 . 158 





| 23 24 


100.168 | 77 .432 | 108 . 605 





| 29 | 30 
101.755 | 95.439 84.930 
| -_ $$. 


25 


88.021 87.479 | 70.964 | 


31 | | 33 | 34 
90.224 94.685 97.927 | 


| 35 | 36 
90.756 | 97 .560 95.046 


When comparing one variety with another it is necessary to know the standard 
error of the mean difference, in order to judge whether this difference is signif- 
cant. The formulas for the standard error of the difference between mean yields 
differ for those sets of varieties which occur together in the same block in groups 
Z and U, in groups Z or U, in groups X or Y, or do not occur together in the 
same block. The formulas for these calculations are, respectively: 

(19), (20), (21), and (22). For example, the standard error of the 
difference between two variety means, such as variety 1 and 2, is 


© (61.595) = 4.27. 
Zi * 


REFERENCES 

[1] G. M. Cox, R. C. Eckuarprt and W. G. Cocuran, ‘‘The analysis of lattice and triple lat- 
tice experiments in corn varietal tests,’’ Jowa Agr. Exp. Sta. Res. Bul., Vol. Bi 
(1940). 

[2] K. P. Narr, “‘A note on the method of fitting constants,’’ Sankhya, Vol. 5 (1941), pp. 
317-328. 

[3] F. Yares, “The principles’of orthogonality and confounding in replicated experiments,” 
Journal Agr. Sci., Vol. 23 (1933), 108-145. 

[4] F. Yares, ‘‘Complex experiments,’’ Suppl. Jour. Roy. Stat. Soc., Vol. 2 (1935), pp. 181- 
247. 

[5] F. Yares, ‘‘A new method of arranging variety trials involving a large number of vati- 
eties,’”’ Journal Agr. Sci., Vol. 26 (1936), pp. 424-455. 

[6] F. Yargs, ‘Orthogonal functions and tests of significance in the analysis of variance,” 
Journal Royal Stat. Soc,, Vol. 5 (1938), pp. 177-180. 





8.605 


——_—__ 


4 . 930 


ee 


5.046 


andard 
signifi- 
. yields 


groups 
in the 


of the 


riple lat 
Vol. 2 


941), pp. 
iments,” 
pp. 18! 
r of vati- 


ariance,” 


NOTES 


This section is devoted to brief research and expository articles on methodology 
and other short items. 


la a 


ON THE EXPECTED VALUES OF TWO STATISTICS 


By H. E. Ropsins 


Post Graduate School; Annapolis, Md. 


In a previous paper’, the following theorem was proved. Let X bea random, 
Lebesgue measurable subset of Euclidean m dimensional space E,, , and let 
u(X) be the measure of X. For every point x of E,, let p(x) be the probability 
that X contains x. Then 


(1) E(u(X)) = [ pind deli. 


In the present note we shall show how this theorem may be used to find the 
expected values of two statistics which arise in sampling theory. Applications 
to similar problems may suggest themselves to the reader. 

Let Y be a real random variable with c. d. f. (cumulative distribution function) 
o(y), so that for every y, 
(2) Pr (Y < y) = o(y). 


Let Y1, °°: , Yn be n independent random variables, each with the distribution 
of Y. Finally, let 
A = min (Y,,---, Yz), 
B = max (Y1,°---, Yn), 
(3) R=B- A, 
F o(B) — ofA). 
Although the values of E(F') and F(R) can be found from the sampling distribu- 
tions of F and R, and, in fact, are well known, we shall show how to apply (1) 
to find E(F) and E(R) directly. 
To find the first of these, let X denote the set of points in the interval 0 < z <1 
such that 
(4) o(A) < x < o(B). 
Then X is a random set with measure 
(5) u(X) = F. 
Moreover, for any point z the probability that X shall contain z is clearly 


(6) p(r) = 1 — 2” — (1 — 2)". 
321 





322 H. E. ROBBINS 
Hence by (1), 
= , n n ae 1 
(7) EQ(X)) = [ 1 - at - a — alae = 2S 
Thus by (5), 
‘ _ a=) 

- or ek 

This result may also be derived by the usual method. In fact, it is not hard 
to show that the ec. d. f. of F is 
(9) r(f) = Pr(F <f) =(1—n)f"+nf"" for O<f<1, 


whence 


1 1 1 
EP) = [ far =a -nn] patna | pag 

(10) 
(l—n)n , n(n—1)_n-1 

m+tl° nn n+i' 
Here the advantage of using (1) is only that it makes unnecessary the calculation 
of the c. d. f. r(f), provided that only E(F) is desired. 

The situation is quite otherwise with E(R). Here the c. d. f. of R is 


(11) 0(r) = Pr (R <1) = n(n — 1) [ e(a) [ o() | [ e@ at Pa a, 


where ¢ is the probability density function of Y. Unless ¢ is a very simple 
function, it would seem difficult to find a simple expression for the integral 


(12) E(R) = | rao(r). 
0 
However, if we let X now denote the linear set 
(13) A<t< B, 
then 
(14) uwX) = B-—-A = R, 
The probability that X shall contain the point ¢ is now 
(15) p(t) = ‘1 — o*(t) — (1 — off))”, 
so that, by (1) and (14), 
(16) B(R) = [ (1-0) — A - o()"} at. 


This formula for the expected value of the range in a sample of n from a popula- 
tion Y with c. d. f. o(t) is believed to be new. 





ERRORS IN SYSTEMS OF LINEAR EQUATIONS 323 


If o(t) is such that dt/do can be found as an explicit function of o, then (16) 
can be written with advantage as 


(17) E(R) = I i —e — 1-0)" Ha, 


For example, suppose the random variable Y has the probability density function 


y 


e€ 
(18) ely) = G+ ey’ 


and hence the ec. d. f. 
(19) o(y) = 
Then 


sae 
1+e-° 


dt 1 
t = log —“— et eee, 
(20) "i. de oa(1 —a) 
Hence from (17), the expected value of the range in a sample of n is 


. ‘1 — 0” — (1 —a)” 

(21) ER) = [ ae. 

The indicated division in the integrand may be carried out, and the result, a 
polynomial in o of degree <(n — 2), when integrated between 0 and 1, gives 
an explicit formula for E(#). Thus for samples of n = 2, 3, 4 we find the values 
of E(R) to be respectively 2, 3, 11/3. Incidentally, it is always true that the 
expected value of the range for n = 3 is three-halves that for n = 2. This 
follows from (16) and the algebraic identity 


(22) f1 —o — (1 — o)'} = 3{1 —o — (1 — o)}. 


REFERENCES 


[1] H. E. Rospsins, ‘‘On the measure of a random set,’’ Annals of Math. Stat., Vol. 15 
(1944), pp. 70-74. 


ON RELATIVE ERRORS IN‘SYSTEMS OF LINEAR EQUATIONS 
By A. T. LoNsETH 


Northwestern University 


Some time ago in these Annals’, L. B. Tuckerman discussed the effect of rela- 
tive coefficient errors on relative solution errors for a non-singular linear algebraic 
system; his discussion was confined to errors so small that their squares and higher 
powers can be neglected. Dr. Tuckerman’s paper was principally concerned 


‘1L. B. Tuckerman, ‘“‘On the mathematically significant figures in the solution of simul- 
taneous linear equations,’’ Annals of Math. Stat., Vol. 12 (1941), pp. 307-316. 





324 A. T. LONSETH 


with the important problem of limiting errors incurred while solving the system, 
and has suggested the desirability of a non-infinitesimal treatment of relative 
errors. Such a treatment follows; the method is a variant of that used in g 
paper on absolute errors’. The computations provide (1) a criterion for allow. 
able relative errors in the coefficients to assure non-vanishing of the deter- 
minant; (2) a bound (subject to this criterion) for the relative error in each solu- 
tion; (3) a specification of accuracy in the coefficients to assure a pre-assigned 
accuracy in the solution. 
We consider a system of linear equations 


n 

(1) Dy aists = Ci t= 1,2,---,n, 
where none of the following vanish: the n(n + 1) coefficients a;;, c; ; the deter- 
minant A of the system; and the n solution-components x;. Under these condi- 
tions it is possible to speak of ‘‘relative errors” in the a’s, c’s, and z’s. Let 
€:;, 0; be the relative errors in a;; and c; respectively, so that a;; is perturbed to 
ai;(1 + €;); c; to c(1 + o:). We inquire as to limitations on ¢;; and o; which 
will permit solution for x ;(1 + p,;) of the system 


(2) > a5;(1 + 6)7(1 + pj) =e(1 +0) 7 


where p; is the relative error induced in x; ; and we seek to limit | p;| in terms 
of the e’s and o’s. We shall assume that for all 7, 7 


(3) lesl, los] <4, 


where 6 will be suitably restricted later. 
Combining (1) and (2) we get 
(4) 2d AsjXjpj = Cray — p> 5564505 — p> G55 €4j 15 pj 
y= yaaa e= 
Since by hypothesis the determinant A of (1) is not zero, matrix A = (a;;) has 
the inverse A’ = (b;;) = (Aji/A), where Aj; is the cofactor of a;;in A. Mul 
tiplying (4) by b.; and summing on 7 we get 


(5) tepe = x bei Cs 5 — Dy dei » Os; €45. Uj 
tes = 


i=] 
_ >» bis bw Ajj ij Uj Pj 
i=l j=l 
and by (3) 
(6) p = 6(M, + N; | Pk \), 


2 A. T. Lonseth, “‘Systems of linear equations with coefficients subject to error,’’ Annals 
of Math. Stat., Vol. 1% (1942), pp. 332-337. 














ERRORS IN SYSTEMS OF LINEAR EQUATIONS 






where p is the greatest | p, |, and 


‘ive n n 

' 1 
na (7) Mi = — Dol bul ( le] + |as251), 
san, | xx | i=1 j=1 


ter- o 1 n n 
ae (8) Ni = ia 2 2 | bes ass; |, 
ned so that 


M;,. = Ni + 7D Loaves 


| ax | 


Denote by M, N the maximum values of M;, N; respectively over k = 1, 2, ---, 
n. From (6), 



















ter- 
ndi- p < 6(M + Np), 
Let whence, if 
d to 
hich | (9) 6< 1/N, 
it follows that 
(10) p = 6M/(1 — 6N), 
? ny, 
which of course bounds each individual | p,|, though rather crudely. To 
oreis bound | p, | more genuinely in terms of 5, M;,, N;, M and N it remains only to 
use this inequality with (6): 
(11) | Pk | < 5(M,, + 5MN,/(1 — 5N)), k= 1, 2, <— 
with M;, Nz as given in (7) and (8). 
If (9) holds, then, it follows that | p; | is bounded by (11)—if p; exists. This 
essential point can be established by solving (5) for p, by iteration’; (9) is a 
° sufficient condition for convergence of the resulting series, and hence for non- 
a singularity of the perturbed matrix (ai; + €;;@:;). 
In order to be sure that | p. | < 7, a pre-assigned number, for all k, it suffices 
p) ee by (10) to choose 6 so that 
Mul- ini 
5M/(1 — 6N) < 2, 
whence : 
6 S n/(M + Nn). 
,”, | A less simple inequality whose satisfaction by 6 will guarantee that | p.| S m 


follows from (11), namely 
5 < (A — B)/2C, 
where A = {(M;, — Nm)’ + 4MNim}', B = Mt+ Nm, C = |MNi— NM, |. 


326 LOUIS M. COURT 


A RECIPROCITY PRINCIPLE FOR THE NEYMAN-PEARSON THEORY 
CF TESTING STATISTICAL HYPOTHESES 


By Louis M. Courr 


In contrasting the tested hypothesis H, with the alternative H2, i.e., in 
comparing the probability distribution p(m, --- , x, | Hi) associated with the 
first hypothesis with the distribution p(x, --- ,2,|H:) associated with the 
second, Neyman and Pearson select the best critical region R* from the infinite 
set of critical regions FR of a specified size a by minimizing the probability 


(1) [ery -++, aa| He) de +> der, 
S—R 


of accepting H, when H- is true (Type II Errors) subject to the constancy of the 
probability of rejecting H, when H; is correct (Type I Errors), 


(2) [ ple, ++, 20 | Hh) day ++ dt = a. 
R 
S in (1) represents the whole of variate (7 , --- , Xn) space and S — R, the con- 


plement of # relative to S. 
Obviously (1) is conditionally minimized when 


(3) [ ple, +++, 2e| Ha) dey ++ de, 
R 


is maximized subject to (2). Neyman and Pearson have shown that if one or 
more members of the one parameter (A) family of regions R*(A) defined by the 
inequalities 


(4) p(a1,°-++,4%n| He) = Ap(m, --+ , tn | Ai) 


satisfy the “‘side” condition (2), they will be best critical regions maximizing (3) 
subject to (2) or minimizing (1) subject to (2)'. As suggested by the notation, 
the family R*(\) depends upon X and, if sufficient restrictions are inrposed upon 
p(a1,---,2%n| Hy) and p(m, --- , tn | He), there is one and only one region for 
every value of \ lying in an interval contained in the positive half-axis. , 
itself, is clearly a function (a) of a. Consequently, R*(A) depends upon a 
and may be written as R*[a]. The best critical region for a preassigned size 
& is given by R*[a]. 

Will we get the same best critical region if among the regions 7' that fix the 
probability of Type II Errors, 


S—T 


1 For a full exposition, see Neyman and Pearson, Stat. Res. Memoirs, Vol. 1 (1936). 





A RECIPROCITY PRINCIPLE 327 


we find the one that minimizes the integral in (2) with R replaced by T, i.e. 
if we find the one that minimizes the probability of Type I Errors? We shall 
call this turnabout of the usual process the reversed Neyman-Pearson principle. 


To discover the answer, we note that / p(a1,-°*: , tn| He) da, --- dx, is equal 
S 


to unity and (5) may be rewritten as 


(6) / p(t, ---,X,| He) dx --- dx, = B. 
" 


The regions that minimize the left side of (2) with R replaced by T subject to 
(6) are obviously identical with those that maximize the negative of this left 
side subject to (6) multiplied through by —1. The latter problem is formally 
identical with the one referred to in the second paragraph of this note and, in- 
voking Neyman and Pearson’s result, we conclude that the said conditional 
maximization is effected by the one parameter (u) family of regions T*(y*) 
defined by the inequalities 


(7) —p(rn, ars , t» | Ay) = —pp(r1, -++, n| He) 
1 
or p(t, +++, tn | Hs) 2 = p(t, ++, tm | Hi). 


yin T*(u *) denotes the reciprocal of u. It is clear from (4) and (7) that the 
families of regions R*(\) and T*(u ’), satisfying the direct and reversed Neyman- 
Pearson processes, coincide. 

As before, » is some function (8) of 8. Hence, 6 is a function B(x)” of » and, 


: ‘ 1 ‘ : 
accordingly, a function 8 Na) of a. Consequently, if the level at which the 
Qa 


probability of Type II Errors in the reversed Neyman-Pearson process is held 
‘ 1 |. : 
constant is taken equal to 1 — al xt | in terms of the level a at which the 
a 


probability of Type I Errors is fixed in the direct Neyman-Pearson method, the 
reversed and direct processes yield the same best critical region. This is the 
reciprocity principle alluded to in the title of this note in its full completeness. 

2 B(u-1) will generally be distinct in form from a(A), although the second line in (7) coin- 
cides upon the substitution of for u~! with (4), since the integrand in (5) is p(m, ---, 


tn | H2) whereas that in (2), regarded as a constraint in the direct process, is p(x, --- , 
Zn | A). 





Z. W. BIRNBAUM AND HERBERT S. ZUCKERMAN 


AN INEQUALITY DUE TO H. HORNICH 


By Z. W. BriRNBAUM AND HERBERT S. ZUCKERMAN 


University of Washington 


H. Hornich' proved a theorem on the average risk of the sum of equal insurance 
policies. It seems of interest to note that when translated from its actuarial 
formulation into the terminology of the calculus of probabilities this theorem 
becomes an inequality for mean deviations of random variables, and to present 
it with a concise proof in non-actuarial language. 

Let x be a random variable with a symmetrical probability distsiivetion, D, = 
E( ||) its mean deviation, 7, 22, --- , 2, independent repetitions of 2, and 
D, = E(|a1 + a + --- + 2, |) the mean deviation of x7; + x2 + --- + a. 
Then D,, fulfills the inequality 


Din fn — 1 


2 


where [| denotes the greatest integer < 5" If the distribution of x is not 
symmetrical but E(x) = 0, the inequality becomes 


Din n—l 
(2) > 2 = Qn 4 
2 


The proof will be given for a continuous random variable but it clearly holds 
quite generally. If f(x) is the probability density of x, and E(x) = 0, then one 
has 


© 


(3) D, = [ la |f(x) dx = 2 | af(x) dx. 


In the expression for D,, , the integration over the entire n-space (2, , X2, «++, Xa) 
may be performed by integrating separately over each of the 2” ‘‘octants” which 
correspond to the different combinations of signs of the coordinates, and thus one 
obtains, for a symmetrical distribution, the estimates 


— +e +00 n Ion a 
D, = [ [ as [ 2.26) Tf) de, 


msl ll 


SENT —€) SENF2—€o 


EQ) fff eee ft f 


8gNn 7j—sgn Fo= + - -—=sgnz,=—1 sgn 21)=sgn ro=- + -=—sgnz,=—! 
8gNF,+1=—SENT,49=-*+=sgnz,=+1 SEN Fo +1 =BANT,+0= ++ *=—SQnIy_= +1 


1 Hans Hornicu, ‘‘Zur theorie des Risikos,’’ Monatsh. Math. Phys., Vol. 50 (1941), pp. 
142-150. 










AN INEQUALITY 329 


SE) fff (Sa-Es)ipotten 


i= s+1 
sgnz\;—sgnzro=---—sgnz,=—1 
BENT, +1—=SENT,+9=-*-—SENIZyz=—+1 


{(n—1)/2] [(n—1)/2] [(n—1)/2] 
n D 1 =D n n 
2 (t)e- 29 Fh eee ()-2 & 


IV 








ll 


















\ce 
ial Din {(n—1)/2] n {[(n—1)/2—1] — 
em = On-i a ¥ —2 a ( 8 ) 
nt 
Din | n ((n—D/2-1) 7 7 i [(m—1)/2—1] 7 oy 

= ws Qa-l ((2= 24) dX Ss ae 8 dX s 
ind ( 2 
tn n n—1)/2—1] ((n—1)/2—1] 

_ Din a n—1 n—1 

Be) EC)" E: 

n n-1 
(29-22) 
2 2 

not i 1 ° 

(G54)- (Ey) 

= 9n— Fe cote =~ On—1 - . 

_ l 2 , [5 
If x is not symmetrical but E(x) = 0, we consider the random variable 2’ 
with the probability density g(x’) = f(—2’), and the random variable y = 
olds t+ 2’. In view of (3) we find 
one +00 p+ ar) 
B\y|) = [ [ lata’ |f(x)g(a") dx de’ = 2 I (x + 2) f(a)g(x") dx dx’ 
- 2 [ g(x’) I af(x) dx dx’ + 2 | g(x) [ x'g(x’) dx’ dx 
0 0 

» Tn) = K(x) \-[ g(—ax) dx + [ f(x) ax\ = K(x). 
hich 0 0 ) 
one 


/ / , . 28 
Let 21, %2, °°, 2, and 2, %2,--- , X» be independent repetitions of x and 2’, 
< , ne ° . ‘ ° 
respectively, and y; = 2; + 2;. Since y has a symmetrical distribution, an 
application of (1) gives 


E(jay + te + +++ anal) = F{E(lat ess tal) HA at -- $2n)} 
>4K(\a tates ta, tan|) = FE +--+ + yl) 


1 E(\y|)n ( a! ) > E(| x |)n (’ * }. 
2 at BH 2" H 


An application of Stirling’s formula shows that the right hand sides in (1) and 
(2) are of the order of magnitude of ~/n. 


I 


ABRAHAM WALD 


NOTE ON A LEMMA 


By ABRAHAM WALD 


Columbia University 


In a previous paper on the power function of the analysis of variance test’, 
the author stated the following lemma (designated there as Lemma 2): 

LeMMA 2. Let v,,--- , v; be k normally and independently distributed variates 
with a common variance o°. Denote the mean value of v; by a; (i = 1, --- , k) 
and let f(u.,---,vx, ¢) be a function the variables v, ; --- , ve and o which 
does not involve the mean values a,--+,a,. Then, if the expected value of 
f(,-++,%, 0) is equal to zero, f(v1,---,v, 0) is identically equal to zero, 
except perhaps on a set of measure zero. 

In the paper mentioned above it was intended to state this lemma for bounded 
functions f(v,, --- , vg) and the lemma was used there only in a case where 
flv, «++ , ve) is bounded. Through an oversight this restriction on f(y; , «++ , v) 
was not stated explicitly.” The published proof of Lemma 2 is adequate if 
f(v; , «++ , ve) is assumed to be bounded. From the fact that the moments of a 
multivariate normal distribution determine uniquely the distribution it is 
concluded there that if for any set (7), --- , 7x) of non-negative integers 


+o +00 ss 
(1) [ oS [ vy +++ ot f(u, «++, ue 9 * dy. +++ dy. = 0 


x ~ 

identically in the parameters a;,--- , a then f(v,--- , v%) must be equal to 
zero except perhaps on a set of measure zero. This conclusion is obvious if 
f(r, «++, ve) is bounded. In fact, from (1) and the boundedness of f(v; , -- + , ») 
it follows that there exists a finite value A such that 


1 1 —}2(v;—a;)? 
g(t, +, %) = (Qn)kP E ~ 2s *** sn) f 


is a probability density function with moments equal to those of the normal 
distribution 


—}2(v;—a,)? 
v es — aie e z t t i 
v( Rs 7 


Hence f(v , --- , ve) must be equal to zero except perhaps on a set of measure 
zero. However, this conclusion is not so immediate if no restriction is imposed 
on f(v,--- , ve) except that 


+00 +00 > 
(2) [ _ [ [fri s+, me) Ler” dy ves dy, < & 
for all values of the parameters a,,---,a,. It is the purpose of this note to 
prove this. In other words, we shall prove the following proposition: 


1A. WaLD, ‘“‘On the power function of the analysis of variance test,’’ Annals of Math. 
Stat., Vol. 13 (1942), pp. 434. 


2 I wish to thank Prof. J. Neyman for calling my attention to this omission. 





rest! 


“lates 
-,k) 
which 
ue of 
zero, 


nded 
vhere 
“4 Ut) 
ite if 
3 of a 
it. is 


ial to 
ous if 
a" V;) 


NOTE ON A LEMMA 331 


Proposition I. Jf (2) holds for all values of the parameters a, --- , a, and 
if for any set (71, °*°, Tr) of non-negative integers equation (1) holds identically 
in a, °°* , @, then flv, --+ , vx) must be equal to zero except perhaps on a set 
of measure zero. 

On the basis of Proposition I and the arguments given on p. 438 of the paper 
mentioned before, it can be seen that restriction (2) on the function f(v: , --- , vx) 
js sufficient for the validity of Lemma 2. 

To prove Proposition I, we shall first show that the following lemma holds. 

Lemma A. If h(v,, --- , vx) is a probability density function and if 


+00 +00 6 > lv,l 
(3) [ ee h(vr, ---, oe * dv, --- du, < « 


0 


for some 6 > O, then the problem of moments is determined for the moments of the 
distribution h(v, --- , ve). 


This lemma was proved by G. H. Hardy for k = 1.° I shall prove it for 
k> 1. Since 


. (ds |v)” sElv5l 
(4) i (Qayr ~° 
we obtain from (3) 
+00 +00 co 2n 2n 
(5) [ hoor 0) | PII | aay. don < a, 


Hence 


k 
+00 +00 = e 2d oi) 
(6) [ ef h(i, +++ , U%) 2 anneal dv, --- du, < wo, 


s=0 — (2n)! 
Denote the 2nth moment of v; by us. Because of (3) the moments yu) are 
k 


finite. Furthermore, denote >> us? by 2. Then we obtain from (6) 


t=1 


e Phe. 
(7) 2. Gayl < * 


From (7) it follows that 
; sa al a 
(8) lim sup (@n)! <1 


(9) lim sup ny <1 
n=o \(2n)! 


‘See for instance, SHonHaT and TAMARKIN, ‘“‘The problem of moments,’’ Math. Surveys 
No.1, Amer. Math. Soc., New York, 1943, p. 20. 





332 A. WALD 


Since according to Stirling’s formula 


lim (2n)!/(2n)?"e""4/4an = 1 


we obtain from (9) 
ag 
(10) lim sup a <1. 


n=eO ne! ai 
Taking reciprocals we obtain 


—1/2n 
(11) in if E> 


n=00 5 
or 


(12) ies tof act” > . > 0. 


n= 


But (12) implies the existence of a positive value p so that 

(13) = 2 (n = 1,2, ---, ad inf) 
From (13) it follows that 

(14) - Noa” = ow, 


n=1 


But (14) is Carleman’s sufficient condition for the determinateness of the prob- 
lem of moments. Hence Lemma A is proved. 

On the basis of Lemma A we can prove Proposition I as follows: From (2) 
we obtain 


+e +e 1vy2Ly 
(15) [ se [ | f(r g tres Up) Fe siete dv; coe dv). < © 


for all values a,,---,a,. Let fi(v) = f(v) for all points vy = (x, --- , %) 
for which f(v) > 0, and fi(v) = 0 for all points v for which f(v) < 0. Similarly, 
let fo(v) = —f(v) for all points v for which f(v) < 0, and f2(v) = 0 for all points 
v for which f(v) > 0. Then fi(v) and fo(v) are non-negative functions and 


(16) fe) = fi) — felr). 
From (15) it follows that 


+00 +00 sa 
(17) [ oe filv)e 292" Fees dv, 
and 


(18) [ ae f foluye rit asrs dv, 





"4 Uk) 
larly, 
yoints 


SKEWNESS AND KURTOSIS 


Let 
(19) five) = fioye” Gj = 1, 2). 


Now we shall show that for any positive values 8; , --- , Bx 


+-20 +00 
(20) [ Soa [ fi (nr, ae , Oper rival ts--tBel ve dv, --- du. < ~. 


In fact, consider the 2° sets (a,,---, a.) where aj = +1 (i = 1,---,h). 
Denote by Ra,..-a, the subset of the k-dimensional Cartesian space which con- 
sists of all points v = (vy, --- , ve) for which »; is either zero or signum v; = 
signum a; (¢ = 1,---,k). Putting a; = a,6;, it follows from (17) and (18) 
that 


(21) [ 


Since (21) holds for any of the 2‘ sets Ra,-.-a, , equation (20) is proved. 
From (1) it follows that 


np ipo cot 
fi (v1 , Se Ses , vera! Bel rE| dv, ee du; és. 
k 


a... 


+20 +oc 
[ f vit ++ oF [ft (vr, ++ 5M) —f2o(ti, ---, me] dy --- dy, =0, 


for all non-negative integers 71, --- , 7%. Hence, because of (21) and Lemma A 
we see that 


(22) fi lv; co. 9) = fo(vr er Uk), 
except perhaps on a set of measure zero. From (22) it follows that 
Sw ee 9) = fils cS Vi) pil fo(rr ico 7) = 0, 


except perhaps on a set of measure zero. Hence Proposition I js proved. 


A NOTE ON SKEWNESS AND KURTOSIS 
By J. ERNest WILKINS, JR. 


University of Chicago 
It is the purpose of §1 of this paper to prove the following inequality: 
(1) a 2 a3 + 1. 


This inequality seems to have first been stated by Pearson’. The inequality 
also follows from a result appearing in the thesis of Vatnsdal. Here we give a 
proof based on the theory of quadratic forms which seems to be more direct 
and more elementary than either of the previous proofs. 


‘Mathematical contributions to the theory of evolution, XIX; second supplement to 
& memoir on skew variation,” Phil. Trans. Roy. Soc. (A), Vol. 216 (1916), p. 432. 





334 J. ERNEST WILKINS, JR. 


The inequality (1) obviously shows that ag 2 1. It is then natural to ask for 
an upper bound for ay. In §2 we shall show that there is no universal upper 
bound (independent of the number N of quantities in the distribution) for ag, 
In fact we find the actual dependence of the maximum possible value of az as 
a function of N. The form of this function seems to be known but not to have 
been rigorously proved before. It then follows from (1) that there is no uni- 
versal upper bound for ay. 


1. The inequality (1). Let us consider the quadratic form 
G(a, b, c) = va" + 2nab + Bac + veb” + Bsbe + vc" 
= N'3(a + ab + 2c)’. 
It follows that G(a, b, c) is a positive semi-definite quadratic form. In fact, 


if there are at least three distinct values of x, then G(a, b, c) is a positive definite 
form. Consequently, its discriminant 


% Mn ve2| 
Vy vo WZ 
V2 V3 M% 


must be non-negative. There is no loss of generality in supposing that », = 0, 
v, = 1, in which case we find that 


1 O 1 
0 1 a3 = 0. 
1 as m% 
Expanding the determinant, we get the inequality (1). 


We remark that equality holds in (1) if and only if there are only two distinct 
values of 2. 


2. The maximum value of a3. It is clear that this maximum will be N™ 
times the maximum value of the function f(z) = =2* on the bounded closed set 
consisting of those points x for which g(x) = =a” = N and h(x) = =x = 0. 
According to the Lagrange multiplier rule, this latter maximum is obtained as 
follows. Let F(x) = f(x) — Ag(x) — pwh(x). Then the maximizing point 
satisfies the relations 


F., = 32; — 2x; — p = O, Sa’ = N, =r = 0. 


The equations 2F, = 0, SzF, = 0 shows that zp = 3, fmax = 2NX/3. Solving 
the equation F,; = 0 gives 


(2) ai = [A + er’ + 9)47/3, 


where e; = +1. For these values of x; we shall have h(x) = 0, g(x) = N if 
and only if 


h = —(7 + 9)IN*3e. 





SKEWNESS AND KURTOSIS 


Therefore \ has the sign opposite to that of Ye, and 
NIN? — (Ze)"] = 9(Ze)’. 
It follows that Ye ~ +N, and that 
(3) 4 = —82e/[N? — (Ze)’}’, 
Sinux = —2NZe/[N* — (2e)"}. 


We have still not obtained the maximum, however, since the minimum will 
also satisfy all of the relations deduced above. We distinguish the maximum 
from the other critical values by examining the function 


(>) = —2N3/(N* — >)’. 


Since Te +N, e; = +1, it is clear that N — 2 2 Le 22—N. We therefore 
consider 6() on the interval (2 — N, N — 2). We find that 


do/ay = —2N°/(N* — *)"" <0, 


so that @ is a decreasing function of Y on the interval indicated. Its maximum 
value will therefore occur when > = 2 — N, and this maximum value will be 


6(2 — N) = N(N — 2)/(N — 1)'. 


The value Ye = 2 — N occurs only when one of the e; , say e;, is equal to +1 
and all the rest are equal to —1. Then we find from (3) and (2) that 


\ = 3(N — 2)/2(N — 1)}, 
(4) y= (N — 1)}, te = Xz = se = Ty = —(N _ 1) . 
a; = f(x)/N = (N — 2)/(N — 1)}. 


° 

Since the maximum value of a3; given by (4) approaches « with N, it follows 
that there is no universal upper bound for a;.. More precisely, the quantity 
a; can be made as large as desired by choosing N large enough and then picking 
a; as in the last paragraph. Since there is no universal upper bound for az, 
it is clear from (1) that there is no universal upper bound for a4. It would 
probably be possible, although rather difficult, to derive an explicit bound for 
o as a function of N by using the methods emploved above for az. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of general interest 
Personal Items 

Mr. Carl A. Bennett is doing war research on a project at the University of 
Chicago. 

Dr. George W. Brown, formerly Research Associate at Princeton University, 
is now at the RCA Laboratories in Princeton, New Jersey. 

Professor Harry Carver is on leave of absence from the University of Michigan 
to do Operations Analysis work with the Army Air Forces. 

Mr. George B. Dantzig is now Principal Statistician of the Army Air Forces 
Statistical Control Division at the Pentagon Building in North Arlington, Vir- 
ginia. 

Assistant Professor Preston C. Hammer of Oregon State College has taken a 
leave of absence to aid in setting up statistical methods of quality control for the 
Lockheed Aircraft Corporation of Burbank, California. 

Dr. Tjalling Koopmans is now associated with the Cowles Commission for Re- 
search in Economics at the University of Chicago. 

Dr. Jerome C. R. Li is now an instructor in. mathematics at Queens College in 
Flushing, New York. 

Associate Professor Joe Livers of Montana State College has been granted a 
leave of absence for the summer and fall terms to study at the University of 
Michigan. 

Assistant Professor Eugene Lukaes of Illinois College, Jacksonville, Illinois, 
has accepted an associate professorship in mathematics at Berea College, Berea, 
Kentucky. 

‘Mr. R. 1. Piper, who has been on leave from the Southern California Telephone 
Company to do war research at the California Institute of Technology, has re- 
turned to the Telephone Compdny as Plant Staff Assistant. 

Dr. Henry Scheffé, formerly lecturer in mathematics at Princeton University; 
has been appointed to an assistant professorship in mathematics at Syracuse 
University. 

Assistant Professor H. M. Schwartz of the University of Idaho has been ap- 
pointed research fellow at the Bartol Research Foundation in Swarthmore, Penn- 
sylvania. 

New Members 

The following persons have been elected to membership in the Institute: 

Bower, O. K. Ph.D. (Illinois) Asso. in Dept. of Math., Univ. of Illinois. 505 W. John, 
Champaign, Ill. 

Breden, Robert E. B.S. (Kansas State College) Personnel Technician, Statistical Serv- 
ices, Technical Section of the War Dept., 270 Madison Ave., New York, N. Y. 
Brixey, John C. Ph.D. (Chicago) Asso. Prof. of Math., Univ. of Oklahoma. 927 S. Pick- 

ard, Norman, Okla. 
Brown, Arthur B. Ph.D. (Harvard) Asst. Prof. of Math., Queens College, Flushing, N. Y. 


Bruyere, Martha (Mrs. Paul T.) Stat., U. S. Public Health Service, Div. of Venereal 
Diseases, Gaithersburg, Md. 


336 





NEWS AND NOTICES 337 


Bruyere, Paul T. M.P.H. (Yale) Stat., U.S. Public Health Service, Bldg. T6, Bethesda, 
Md. 

Carter, Gerald C. Ph.D. (Purdue) Dept. Head, Naval Training School, Purdue Univ. 
530 Garfield St., W. Lafayette, Ind. 

Casanova, Teobaldo. Ph.D. (New York) Res. Stat., Univ. of Puerto Rico, Rio Piedras, 
P. R. 

Chances, Ralph. B.B.S. (C.C.N.Y.) Stat., Industrial Surveys Co., 347 Madison Ave., 
New York, N. Y. 

Cody, Donald D. A.B. (Harvard) Res. Math., Res. Lab., Indianapolis Naval Ordnance 
Plant, Indianapolis, Ind. 

Duvall, George E. Asst. Physicist, UCDWR, U.S. Navy Radio and Sound Lab., San 
Diego 52, Calif. 

Ellis, Wade. Ph.D. (Michigan) Special Instr. in Math., Univ. of Michigan. 921 Wood- 
lawn, Ann Arbor, Mich. 

Field, Robert W. Ph.D. (Illinois) Asso. Prof. of Industrial Engineering, Purdue Univ., 
Lafayette, Ind. 

File, Quentin W. Ph.D. (Purdue) Instr. of Elec. Wiring, Purdue Naval Training School. 
Physics Bldg., Purdue Univ., Lafayette, Ind. 

Freeman, Albert M. Dir. Math. Lab., Boston Fiduciary & Research Associates. Neck 
Rd., Tiverton, R. I. 

Gerlough, Daniel L. B.S. (Calif. Inst. of Tech.) Quality Control Engineer, Plomb Tool 
Co., Box 3519, Terminal Annex, Los Angeles 54, Calif. 

Germond, Hallett H. Ph.D. (Wisconsin) Asso. Prof. of Math., Univ. of Florida, Gaines- 
ville, Fla. (On leave). 

Ghormley, Glen E. Stat. Analyst, Lockheed Aircraft Corp. 139 N. Chester Ave., Pasa- 
dena 4, Calif. 

Grant, Eugene L. A.M. (Columbia) Prof. of Economics of Engineering, Stanford Univ., 
Calif. 

Gunlogson, L.S. B.B.A. (Minnesota) Ensign, USNR. 3616 18th Ave., So., Minneapolis 
7, Minn. 

Hart, Alex L. Ph.D. (Minnesota) Asst. to the Dir., Res. Dept., Eastern Air Lines, Inc. 
141-24 79 Ave., Flushing, L. I., N. Y. 

Kefferstan, William F. Mgr., Economic Research, Boston Fiduciary & Research Associates, 
50 Congress St., Boston, Mass. 

Lyons, Will. B.Sc. (Bucknell) Economist, Gen. Statistics Staff, W.P.B. 2027 Park Rd., 
N.W., Washington 10, D. C. 

McBee, Ethelyne L. M.A. (Columbia) Stat., Dept. of Agric. 2126 N. Stafford St., 
Arlington, Va. 

McIntyre, Donald P. M.A. (Toronto) Meteorologist in Charge, Prince George Airport. 
Box 296, Prince George, B. C., Canada. 

Moss, Judith. B.A. (Vassar) Jr. Math., Stat. Res. Group, Div. of War Research, Columbia 
Univ. 319 St. John’s Pl., Brooklyn 17, N. Y. 

Oosterhof, Willis M. M.A. (Michigan) Stat., Mich. State Dept. of Social Welfare. 
811 Hackett St., Ionia, Mich. 

Piper, Robert I. B.A. (Montana) Res. Asso. Room 204, Astrophysics Bldg., Calif. Inst. 
of Tech., Pasadena 4, Calif. 

Robbins, Herbert E. Ph.D. (Harvard) Lt., USNR. Post Graduate School, Annapolis, 
Md. 

Seeley, Sherwood B. Ch.E. (New York) Dir., Res. and Tech. Div., Joseph Dixon Cru- 
cible Co., 167 Wayne St., Jersey City 3, N. J. 

Simpson, Tracy W. E.E. (Armour Inst. of Tech.) Sales Promotion Mgr., Marchant 
Calculating Machine Co. 2903 Forest Ave., Berkeley, Calif. 

Smith, Edward S. Ph.D. (Virginia) Prof. of Math., Univ. of Cincinnati, Cincinnati 21, 
Ohio. 

Waksberg, Joseph. B.S. (C.C.N.Y.) Economic Analyst, Bureau of the Census. 1422 
Saratoga Ave., N.E., Washington 18, D. C. 





REPORT ON THE WELLESLEY MEETING OF THE INSTITUTE 


The Seventh Summer Meeting of the Institute of Mathematical Statistics was 
held at Wellesley College, Wellesley, Mass., on Saturday and Sunday, August 12 
and 13, 1944, in conjunction with the meetings of the Mathematical Association 
of America and the American Mathematical Society. The following 51 members 
of the Institute attended the meeting: 


T. W. Anderson, H. E. Arnold, K. J. Arnold, L. A. Aroian, A. L. Bailey, J. L. Barnes, 
C. I. Bliss, A. H. Bowker, B. H. Camp, C. W. Churchman, W. G. Cochran, T. E. Cope, 
J. H. Curtiss, W. E. Deming, P. S. Dwyer, Will Feller, C. D. Ferris, R. M. Foster, 
H. A. Freeman, Henry Goldberg, E. J. Gumbel, P. R. Halmos, Harold Hotelling, Truman 
Kelley, L. R. Klein, Myra Levine, John Mandel, J. W. Mauchly, Richard v. Mises, 
E. B. Mode, Vaclav Myslivec, P. M. Neurath, M. L. Norden, C. O. Oakley, P. S. Olm- 
stead, Edward Paulson, Wm. Reitz, 8. L. Robinson, F. E. Satterthwaite, Henry Scheffé, 
W. A. Shewhart, Andrew Sobszyk, H. W. Steinhaus, Marian Torrey, Mary Torrey, A. W. 
Tucker, J. W. Tukey, Abraham Wald, R. M. Walter, Elizabeth Wilson, Jacob Wolfowitz. 


The first session was held jointly with the Mathematical Association and con- 
sisted of a Symposium on “Potential Opportunities for Statisticians and the 
Teaching of Statistics.”” The President of the Institute, Dr. W. A. Shewhart, 
presided. The principal addresses were made by Dr. Shewhart and Professor 
Harold Hotelling. Remarks were also made, upon invitation of the Chairman, 
by Prof. Milton de Silva Rodrigues of Sao Paulo University in Brazil who is 
spending a year in studying methods of teaching statistics in this country, and by 


Dr. Vaclav Myslivec, Czechoslovak Delegate to the United Nations Interim 
Commission on Food and Agriculture. A lively discussion was under way when 
time forced the conclusion of the meeting. There was continued discussion by a 
smaller group for some time afterwards. 

Professor B. H. Camp acted as Chairman at the Sunday morning session, a 
contributed papers session held jointly with the Association. The following 
papers were presented: 


. Statistical Tests Based on Permutations of the Observations. 
A. Wald and J. Wolfowitz, Columbia University. 

. Error Control in Matriz Calculation. 
F. E. Satterthwaite, Aetna Life Insurance Co. 

. On Cumulative Sums of Random Variables. 
A. Wald, Columbia University. 

. The Approximate Distribution of the Mean and of the Variance of Independent Vari- 
ates. 
P. L. Hsu, National University of Peking. (Introduced, and presented, by W. Feller, 
Brown University). 

. Ranges and Midranges. 
E. J. Gumbel, New School for Social Research. 

. Statistics of Sensitivity Data, IJ. Preliminary report. 
C. W. Churchman, Frankford Arsenal and Benjamin Epstein, Westinghouse Electric 
and Manufacturing Co. 


338 





Vari- 


eller, 


WELLESLEY MEETING 339 


President Shewhart presided at the Sunday afternoon session. The follow- 
ing invited addresses were given: 


1. The Problem on Tolerance Limits. 
Lt. J. H. Curtiss, USNR. 

2. Some Improvements in Weighing and Other Experimental Techniques. 
H. Hotelling, Columbia University. 


A business meeting was held at the conclusion of the Sunday afternoon session. 
The Secretary-Treasurer made a brief report dealing with (1) the financial condi- 
tion of the Institute and (2) the membership growth of the Institute. The Pres- 
ident, reporting for the Editor, indicated a need for more papers for the next 
two or three issues of the Annals. The Institute, after some discussion, then 
passed two Amendments to the Constitution and four Amendments to the By- 
Laws. These Amendments are listed in the following section. A resolution 
thanking the officials of Wellesley College was passed. 

A dinner for the three mathematical organizations was held Sunday evening. 
Addresses were made by Captain Mildred H. McAfee and Professor Marshall H. 
Stone. Later in the evening there was a musicale featuring David Barnett. 


P. S. DwYER 
Secretary 











AMENDMENTS TO THE CONSTITUTION AND BY-LAWS 
OF THE INSTITUTE 


The following Amendments to the Constitution and By-laws were passed at 
the business meeting at Wellesley College on August 13, 1944. The votes of all 
voting members who sent ballots to the Secretary-Treasurer prior to the time of 
the meeting were counted in the balloting. The Amendments as adopted are 


identical with the proposed Amendments which were placed in the hands of the 
membership in July: 


Amendments to Constitution 


1. Article III. 3. The first sentence, which was 
“The Institute shall have a Committee on Membership composed of three Fellows.” 
shall be revised to read : 


“The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows.’’ 


2. Article IV. 3. The first two sentences, which were: 

‘‘The Committee on Membership shall hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of the Committee by the 
Secretary-Treasurer at least five days before the date set therefor.’’ 

shall be revised to read : 

‘‘Meetings of the Committee on Membership may be held from time to time at the call 
of the Chairman or any member of the Committee provided notice of such call and the 
purpose of the meeting is given to the members of the Committee by the Secretary- 
Treasurer at least five days before the date set therefor. Committee business may also 
be transacted by correspondence if that seems preferable. 


Amendments to By-Laws 


1. Article I. 4. Add the following sentence: 
‘The power of election to the different grades of Membership, except the grades of 
Member and Junior Member, shall reside in the Board.”’ 


2. Article I. 5. which was: 

“‘The Committee on Membership shall prepare and make available through the Secre- 
tary-Treasurer an announcement indicating the qualifications requisite for the different 
grades of membership.” 

shall have added the following sentences: 

“The Committee shall review these qualifications periodically and shall make such 
changes in these qualifications and make such recommendations with reference to the 
number of grades of membership as it deems advisable. The power to elect worthy 
applicants to the grades of Member and Junior Member shall reside in the Committee, 
which may delegate this power to the Secretary-Treasurer, subject to such reservations 
as the Committee considers appropriate. The Committee shall make recommendations 
to the Board of Directors with reference to placing members in other grades of member- 
ship. The Committee shall give its attention to the question of increasing the number 
of applicants for membership and shall advise the Secretary-Treasurer on plans for that 
purpose.”? 


340 





at 
ll 
of 


ch 
he 


all 
he 


Iso 


er- 


3. 


4, 


AMENDMENTS TO THE CONSTITUTION AND BY-LAWS 341 


After Article II. 1(a) Exception. Add: 

““(b) Exception. Any Member or Fellow may make a single payment which will be 
accepted by the Institute in place of all succeeding yearly dues and which will not other- 
wise alter his status as a Member or Fellow. The amount of this payment will depend 
upon the age of this Member or Fellow and will be based upon a suitable mortality table 
and rate of interest, to be specified by the Board of Directors.’’ 


and 


““(¢) Exception. Any Member or Junior Member of the Institute serving, except as 
a commissioned officer, in the Armed Forces of the United States or of one of its allies, 
may upon notification to the Secretary-Treasurer be excused from the payment of dues 
until the January first following his discharge from the Service. He shall have all privi- 
leges of membership except that he shall not receive the Official Journal. However during 
the first year of his resumed regular membership he may have the right to purchase, at 
$2.50 per volume, one copy of each volume of the Official Journal published during the 
period of his service membership.”’ 





ABSTRACTS OF PAPERS 


Presented on August 13, 1944, at the Wellesley meeting of the Institute 
1. Statistical Tests Based on Permutations of the Observations. A. Watp and 
J. Wo.trowitz, Columbia University. 


It was pointed out by Fisher that statistical tests of exact size, based on permutations 
of the observations, can be carried out without assuming anything about the underlying 
distributions except their continuity. Scheffé has proved that, for an important class of 
hypotheses, these tests are the only ones with regions of exact size. Texts based on permuta- 
tions of the observations have been constructed by Fisher, Pitman, Welch, and the present 
authors. In the present paper, the authors prove a theorem on the limiting normality of 
the distribution, in the universe of permutations, of a class of linear forms. Application 
of this theorem gives the limiting normality (in the universe of permutations, of course) 
of the correlation coefficient, and of a statistic introduced by Pitman to test the difference 
between two means. The limiting distribution of the analysis of variance statistic in the 
universe of permutations is also obtained. 


2. Error Control in Matrix Calculation. FRANKLIN E. SATTERTHWAITE, Aetna 
Life Insurance Co. 


The arithmetic evaluation of matrix expressions is often rather complicated. One of the 
causes of this is the fact that relatively minor errors (such as rounding errors) introduced 
in an early step may be magnified to such an extent in succeeding steps that the final result 
is useless. Iterative methods to meet this difficulty have been reviewed very completely 
by Hotelling. In this paper a different approach is taken. Conditions on the norm of a 
matrix are determined so that a Doolittle process will not magnify errors to more than two 
or three decimal places. It is then pointed out that if an approximation to the inverse of 
the matrix is available, most problems can be rearranged so that the required norm conditions 
are met. A Doolittle process may then be used to any number of decimal places with as- 
surance that errors will not accumulate to more than a limited number of decimal places. 


3. On Cumulative Sums of Random Variables. A. Wap, Columbia University. 


Let {z:} (¢ = 1, 2, --- ad inf.) be a sequence of independent random variables each having 
the same distribution. Denote by Z; the sum of the first 7 elements of the sequence. Let 
a > Oand b < 0 be two constants and denote by n the smallest integer for which either 
Zn >aorZ,<b. Neglecting the quantity by which Z, may differ from a or b (this can be 
done if the mean value of |z;| is small), the probability that Z, > c forc = aandc =) 
is derived, and the characteristic function of n is obtained. The probability distribution of 
nm when z; is normally distributed is derived. These results have application to various 
statistical problems and to problems in molecular physics dealing with the random walk of 
particles in the presence of absorbing barriers. 


4. The Approximate Distribution of the Mean and of the Variance of Indepen- 
dent Variates. P.L. Hsu, National University of Peking. 


Let X, be mutually independent random variables with the same cumulative distribution 
function; let E(X;) = 0, E(Xt) = land E(Xt) = 8. Finally put S = n- Do?_, X; and 
7 = n1 ae (X, — S)?. The author first gives a new derivation of H. Cramer’s well- 
known asymptotic expansions for Pr (n's <x). The proof is much more elementary and 

342 





and 


ions 
ying 
3 of 
uta- 
sent 
y of 
tion 
irse) 
ence 
. the 


etna 


f the 
juced 
esult 
etely 
. of a 
n two 
rse of 
tions 
sh as- 
laces. 


sity. 


aving 

Let 
either 
van be 
c=) 
‘ion of 
arious 
alk of 


epen- 


bution 
¢, and 
3 well- 
ry and 


ABSTRACTS OF PAPERS 343 


avoids in particular the use of M. Riesz’ singular integrals. Instead a considerably simpler 
Cesaro-type kernel is used, which has first been introduced by A. C. Berry (Trans. Amer. 
Math. Soc. 49 (1941), pp. 122-136). The same method is then used to derive similar asymp- 
totic expansions for Pr (n* (n -1)< @6- 1)'x). The method can be extended to the case 
of unequal components and also for the study of other functions encountered in mathematical 
statistics. 


5. Ranges and Midranges. E.J.GumBEL, New School for Social Research. 


The mth range w,, and the mth midrange v» are defined as the difference and as the 
sum of the mth extreme value taken in descending magnitude (‘from above’’) and the 
mth extreme value taken in ascending magnitude (‘from below’’). The semi-invariant 
generating functions Lm(t) and mL(t) of the mth extreme values from above and below are 
simple generalizations of the semi-invariant generating functions of the largest and of the 
smallest value which have been given by R. A. Fisher and L. H. C. Tippett. If the sample 
size is large enough the two mth extreme values may be considered as independent variates. 
Then, the semi invariant generating functions L(t, m) and L,(t, m) of the mth range and 
of the mth midrange are 


Lw(t,m) = Em(t) + mL(—t); L,(t, m) = Lm(t) + mL (et). 


If the initial distribution is symmetrical the semi invariant generating function of the mth 
range is twice the semi invariant generating function of the mth extreme value from above. 
The distribution of the mth range is skew, whereas the distribution of the mth midrange is 
of the generalized, symmetrical, logistic type. The even semi invariants of the mth midrange 
are equal to the even semi invariants of the mth range. For increasing indices m the distri- 
butions of the mth extremes, of the mth ranges and of the mth midranges converge towazd 
normality. 


6. Statistics of Sensitivity Data, II. Preliminary report. C. W. CHuRCcHMAN, 


Frankford Arsenal, and BENJAMIN EpstTErn, Westinghouse Electric and Manu- 
facturing Co. 


In this paper a study is made of the distribution of the first two moments of sensitivity 
data as functions of the sample size. The chief results are briefly these : 

(a) The distributions of 2 and oz (for definition of these functions, see ‘“‘On the Statistics 
of Sensitivity Data,” by the authors in the Annals of Mathematical Statistics, Vol. 
XV, No. 1) approach normality rapidly as functions of the sample size; 

(b) @ and oz are “‘almost’’ independent even for small sample sizes, thus justifying the 
use of Student’s ratio in texts of significance for differences between two sample 
means. 





